Caching for Sensor Web REST API


Introduction

Sensor Web REST API defines lightweight interface for queryng and accessing time series data from different sources. Data sources connects to the API via Service Provider Interface (SPI). REST API implemented on top of Java technologies such as Spring, Servlets and JSP's. There are several implementations of API such as SOS Proxy and DAO Series REST API. SOS Proxy delegates requests to underlying SOS instances. DAO provides direct access to the database with series data via REST API.

Caching is a natural technique to reduce the response time of a network service like http-proxy or CRUD web-application. That makes caching very useful in terms of Sensor Web REST API. Caching in SOS Proxy provides not only reduced latencies and greater responsiveness but also more fine grained control over served data. Last feature is crucial if data from thirdparty SOS implementation is requested.

The expected result of a project is a separate caching module for Sensor Web REST API. The module provides a caching logic and a set of interfaces for popular databases such as Infinispan or Redis.

Cache architecture

Logic of dataset caching and retrieval depends on query parameters. Influence of different parameter values and logic behind them described in table. Description of parameters can be found here and here. "single" parameter type means that parameter can have only one value. "multi" parameter type means that it can be assigned a set of values. The columns "Merge condition" defines when time series is considered eligible for the merge. "Extraction condition" column tells whether cached time series can be used as a response for a set of request parameters.
Parameter name Type Merge condition Extraction condition
Query Parameters
expanded bool Exact match Exact match
timespan ISO8601 formatted period Merge with existing timespan if it overlaps

If T is an already stored timespan and Ts is a start date and Te is an end date, then requested timespan matches if its:

  • start date >= Ts
  • end date <= Te
width single ignore ignore
height single ignore ignore
style json ignore ignore
legend bool ignore ignore
generalize bool not store if "true" bypass the cache if "true"
force_latest_values bool ignore bypass the cache
format single Exact match Exact match
base64 bool ignore ignore
Common Query Parameters
expanded bool Exact match Exact match
platformTypes multi Exact match Exact match
datasetTypes multi Exact match Exact match
services multi Exact match Exact match
platforms multi Exact match Exact match
categories multi Exact match Exact match
phenomena multi Exact match Exact match
station single Exact match Exact match
locale single ignore ignore

Caching HTTP headers

Cache-Control tokens for client request:
  • no-cache
  • only-if-cached
For proxy response, containing cached data:
  • Age: age of an oldest cached chunk

Data schema

For each continuous time series cache maintains metadata. Metadata consists of RequestParameterSet and several timestamps. One points to the time of retrieving corresponding time series from SPI implementation. The other ones represents start and end of a time series timespan.

Insert timestamp is used in the "Age" header calculation process.

Timeseries data is bound to a corresponding metadata . Data contains serialized object of a org.n52.io.response.dataset.DataCollection type.

Cache architecture(4).png

General caching logic

Logic on the following chart is not bound to specific cache database. All checks and method calls can be done from CachingDataService. CachingDataService is a wrapper around DataService implementation. General caching logic resides between the REST API and SPI layers. The main goal of CahingDataService is to check caching headers and evaluate calls to the underlying cache implementation.

Cache implementation gets injected into the CachingDataService as a Spring Bean along with enclosed DataService. Cache implementation implements the DataCache interface. Cache implementations maintains their own logic of storing and fetching data from specific database.

Cache architecture(2).png

Database specific logic

DataCache interface defines three operations:
  • Check whether data for a specified set of parameters is cached.
  • Retrieval of a cached time series for set of parameters.
  • Insertion of time series with corresponding parameters.
First two operations is similar to each other. The main task is to get only requested timestamps from the whole cached time series.

Flowchart of check operation:

Cache architecture(5).png

Insert operation has the most sophisticated logic, because it has to merge time series together.

Cache architecture(6).png

Weekly reports

Week 1

Status
  • Implemented a landing page for REST API. Link to Pull Request. Landing page is made with the Bootstrap framework. Generation of API docs with Jekyll is integrated into Maven build.
  • Created Docker compose file for demo environment. Compose file includes two SOS instances and SOS Proxy. The general structure of compose file has been copied from SOS compose file.
Problems
  • Don't know which docs (1, 2) should be referred from the landing page.
Next tasks
  • Learn more about HTTP caching
  • Describe relations between REST API query parameters and caching logic
  • Find the way to configure demo stand automatically, e.g. insert sample data to SOS database.
  • Close the landing page pull request.

Week 2

Status
  • Learned more about HTTP caching by studying the RFC and overview of caching HTTP headers. Have defined HTTP headers which should be supported by REST-API for compliance against the HTTP Caching RFC.
  • Composed a table which shows relations between query parameters and store and load procedures.
  • Received some new comments on landing page pull request.
  • Found a way to configure SOS database via prepared settings file.
Problems
  • Still have not come up with a solution of SOS auto configuration problem. SOS install instructions is outdated. The default settings template has been deleted from the repo.
Next tasks
  • Describe the data model of caching storage.
  • Draw the flowchart of REST-API request with caching.
  • Fix comments of landing page PR.
  • Find the way to configure the SOS via the prepared settings.

Week 3

Status
Problems
  • No major problems
Next tasks
  • Read about selected Cache-Control attributes and check they are not proxy specific
  • Simplify SOS Proxy docker compose file and get rid of local SOS'es
  • Implement general caching logic in a CachingDataService
  • Implement DataCache interface based on Infinispan database

Week 4

Status
  • Found out that selected Cache-Control header attributes is not proxy specific
  • SOS Proxy docker compose file is simplified and now contains only the SOS Proxy webapp and database
  • Implemented part of a Infinispan caching service
Problems
  • Didn't start to implement general caching logic because underestimated task of Infinispan caching service
Next tasks
  • Add data series appending logic to a cache store procedure
  • Write tests for the DataCache implementation
  • Finish the CachingDataService
Topic revision: r11 - 26 Jun 2017 21:05:50, AntonEgorov
This site is powered by FoswikiCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback