Statistics for OGC Web services


Introduction

Maintainers of 52\xB0North SOS deployment currently do not know what the "most commen" requested procedures are and when people are requesting data for which area and time frame. The system administrators and probably powerusers (from now on referred as only "users") are interested in the last week mostly, or do they request historical data as well? How large is the average response document? Which sensor stations get rarely any queries? How long do execute requests of a specific WPS process take on average?

To answer these questions, this project is responsible to analyse SOS and WPS requests and responses and capture "usage statistics", and to integrate these usage statistics into the administrative backends of the services.

About me

I'm a 25 years old hungarian computer engineer and this year I finish my master degree at the Technical University in Budapest. I have studied also at the Karlsruhe Institute of Technology (K.I.T) in Karlsruhe, Germany and at the Universidade do Porto in Porto, Portugal.

Blog posts

1st blog post

2nd blog post - midterm

Details

GitHub

GitHub project link

Setting pages

Elasticsearch

The Elasticsearch functionality must be enabled for this module to be actived. Before you start your SOS deployment make sure that your Elasticsearch server is running and reachable. You can enable the module by checking the check box on the settings page.

Connection mode: Two connection mode is supported. Node and Transport client. You can read more about it on the official site what is the difference. In short if you can use the Node connection type it is recommended, but in case there is some firewall issues stick with the Transport Client mode.

Cluster name: Elasticsearch servers are organized in cluster. Specify the name to connect to. For more informatino refer to the official guide here .

Address(es) of the cluster: The Statistics module uses unicast messages to connect with the Elasticsearch cluster. You can specify here a comma separated list of addresses in form of <host>[:<port>] to connect to.

Index name: The index name. If the index is already exists the module will store the statistics values in that index. This value and the next Type name must be used in conjuction to specify the exact path of your data in Elasticsearch.

Type name: Inside your index under what type-name to store your data. If you type-name in your index is already exists the module will continue to store the collected data there if it is possible.

Unique id: It is possible that many SOS deployments could store their data in one Elasticsearch index/type. To distingush between the deployments an unique id is generated for you, but you can specify your own id. Collosion is possible.

Enable preconfigured Kibana settings: If you would like to use our provided Kibana visualizations and dashboards you can enable the checkbox and specify the file location (json formatted text file) of the kibana settings. The settings will be loaded to the .kibana Elasticsearch index where the Kibana 4 application will read up the configurations.

Important! The settings will take only effect if you restart the webapplication.

Geolite2 Databases

Elasticsearch Watcher

Elasticsearch Watche is a plugin for Elasticsearch that provides alerting and notification based on changes in your data. The plugin is binded to licensing.

The Watcher can be configured via the REST API and with the JAVA API also. The Java API is more custimazable but not comfortable for the developers, probably in the future the documentation and the source code will be available and the development will be more fluid. On 25-06-2015 was the first 1.0.0 version released so it is still an early version.

With the Iceland admin UI it is easily feasable to create custom watches and the user can configure the parameters for watch for. e.g: the long running processes threshold value.

See my test example of managing a watcher which measures the incoming exception events and beyond a threshold notofies the user via email.sample watcher. In order to send the notification email you need to setup an email configuration in your elasticsearch.yaml file based on this guide.

Notifications are mainly via email and external weburl HTTP requests with any json payload data is available.

Performance test

Methodology: The request where sent from SoapUI 5.2.0 version with different kind of strategies and parameters. Basic data's were inserted into the Sos database before the tests were run. The following requests were sent:

  1. GetObservation with no filter
  2. GetReult with no filter
  3. GetObservation with single spatial and temporal filter
  4. GetResult with single spatial and temporal filter
  5. GetCapabilities with all sections
  6. DescribeSensor with valid time
  7. GetFeatureOfInterest
The Simple and Bursts strategies were used for testing. The with the Simple and Burst options I can modify the messages volume more fine-grained in time. The Variance strategy modifies the thread count it could end in very different test results even with the same settings. Thread version increases the thread number only.The other ones are only available in the pro version.
Machine specs
OS System Win 8.1 x64 Enterprise
CPU Intel Core i7-4710HQ 2.50GHz
RAM 8GB DDR3
VGA Nvidia Geforce GTX 860M
PostgreSQL PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit
Webserver Apache Tomcat Version 8.0.24 x64
Elasticsearch 1.6.0


Test results:
Environment setup Result
# statistics collection # threads Strategy Burst Delay Burst duration Test Delay Random Limit (seconds) Database type min max avg cnt tps
1 FALSE 5 simple - - 1000 0.5 60 PostgreSQL 65 1039 88 351 5,83
2 FALSE 5 simple - - 1000 0.5 60 PostgreSQL 82 569 127 333 5,53
3 FALSE 5 burst 2 10 - - 60 PostgreSQL 89 648 117 1795 38,5
4 FALSE 5 burst 2 10 - - 60 PostgreSQL 87 358 127 1865 35,98
5 FALSE 5 burst 0 60 - - 60 PostgreSQL 91 769 130 2131 35,02
6 FALSE 5 burst 0 60 - - 60 PostgreSQL 108 674 114 1929 32,15
7 TRUE 5 simple - - 1000 0.5 60 PostgreSQL 74 934 115 329 5,46
8 TRUE 5 simple - - 1000 0.5 60 PostgreSQL 90 526 136 324 5,39
9 TRUE 5 burst 2 10 - - 60 PostgreSQL 193 1123 538 400 7,34
10 TRUE 5 burst 2 10 - - 60 PostgreSQL 275 1018 614 342 6,6
11 TRUE 5 burst 0 60 - - 60 PostgreSQL 109 1509 279 905 15,08
12 TRUE 5 burst 0 60 - - 60 PostgreSQL 117 2653 416 584 9,75

https://docs.google.com/spreadsheets/d/1pNiRP7TNgwWqv3lPvQOmq5qrk0bH-XW9iHmAeXptfUQ/edit#gid=0

The most telling parameter the AVG and the TPS. The former means the avarage request-response time in milliseconds and the latter is for the number of transactions per second and calculated based on actual time passed.

The simple strategy doesn't pose enough stress on the setup so greatly different processing time is not noticable. The test cases were varied between 0.5-1 second. The Simple Strategy is perfect for Baseline testing. I used it to assert the the basic performance of the service and validate that there are no threading or resource locking issues.

simple.jpg

The Burst strategy is specifically designed for Recovery Testing and takes variance to its extreme; it does nothing for the configured Burst Delay, then runs the configured number of threads for the “Burst Duration” and goes back to sleep. The TPS is cleary signs in decreased processing throughoutput with my statistics module enabled. By the Burst tests the numbers were greatly varying with the statistics collection enabled. The TPS metrics has the interval by enabled statistics collection between 7-15 so it means some hidden parameters degrade the performance which hopefully with further study could be found.

With disabled statistics module and with the burst strategy the server performed with nearly the same results.

burst_2s_interval.jpgburst_no_interval.jpg

Schedule

Week 1 (25-31 May)

Absence: Part-time because of final Exam
Status
Problems

None
Next tasks
  • Defining what are the interesting metrics for the SOS request
  • Implement those metrics (a subset of it the mandatory operations such as GetCapabilities, GetObservation, DescribeSensor )
  • Find an easy way to send valid messages from SoapUI (bulk insertion) to the SOS server local deployment and validate the data empirically in the Kibana UI
  • Get the next user story to implement if the current one is finished

Week 2 (1-7 June)

Absence: Part-time because of final Exam
Status
  • Defined the "interesting" metrics for the XXXRequests in a Google Spreadsheet and implemented them.
  • Got acquainted with Kibana scripted fields feature and created visualizations for the test data - with the scripted fields it is possible to create new fields derived from the existing ones.
  • Make the SOS web application run-able from Maven Jetty plugin.
  • Created Soap UI test with valid JSON,SOAP and KVP request based on the SOS Client feature
  • The the statistics feature can be installed by following this short guide Statistics branch
Problems
  • Visualizing the TimePeriod temporal fields in Kibana UI which is meaningful for the user.
Next tasks

Week 3 (8-14 June)

Absence: Part-time because of final Exam
Status
Problems
  • Visualizing the TimePeriod temporal fields in Kibana UI which is meaningful for the user.
  • The Eclipse Java code formatter formats the new code different than the existing codebase.
Next tasks

Week 4 (15-21 June)

Status
  • RequestContext extended with the incoming content-type format and the requested accept-type format
  • ResponseEvent class created in SOS API
Problems
  • Visualizing the TimePeriod temporal fields in Kibana UI which is meaningful for the user.
  • The Eclipse Java code formatter formats the new code different than the existing codebase.
Next tasks
  • Use settings API for initializing the Java ElasticSearch client and to the GeoLite2 databases
  • Migrate the project to Iceland/SOS5.x
  • Settings functionalities in the SOS admin user interface

Week 5 (22-28 June)

Status
  • Migrating my statistics module on the Iceland project and on the SOS 5.x branch
  • With the Iceland settings API created a settings page for the Geolite2 database configurations
  • With the Iceland settings API created a settings page for the Elasticsearch database configurations
  • Added the TransportClient connection mode to the the Java Elasticsearch client
Problems
  • Visualizing the TimePeriod temporal fields in Kibana UI which is meaningful for the user.
Next tasks
  • Prepare for the Midterm demo
  • Redoing my Iceland RequestContext extensions
  • Measure for how long a request has been run.
  • A stress test on my machine with logging enabled and disabled firstly with soapUI or with some other tool
  • Kibana visualization and dashboard import/export functionality.

Week 6 (29 June - 5 July)

Status
  • Preconfigured Kibana settings importing feature
  • Measure the execution time of the request
  • Updating documentation and readme
  • Midterm demo preparation
Problems
  • Visualizing the TimePeriod temporal fields in Kibana UI which is meaningful for the user.
Next tasks
  • Requests' spatial data transformation to ES geoshape and storing it.
  • Performance test with and without the statistics module
  • Get to know the realtime alerting system for ES called watcher

Week 7 (6-12 July)

Status
  • Requests' spatial data transformation to ES geoshape and storing it.
  • Performance test with and without the statistics module
  • Get to know the realtime alerting system for ES called<a href="https://www.elastic.co/guide/en/watcher/current/introduction.html">watcher</a>
  • Documentation added to the branch.
Problems
  • The Kibana project doesn't support the geo_shape aggregation yet.
  • IPv4 mapped IPv6 Iceland enhanchment
Next tasks
  • Sos Response and Output metrics data collections
  • Integration tests
  • Test coverage

Week 8 (13-19 July)

Status
  • Sos Output metrics data collections.
  • Integration tests.
  • Performance tests. See the results at the wiki page here.
  • Test coverage with Jacoco.
  • Refactoring the "parameters" which are stored to be more developer friendly and fool-proof.
Problems
  • The Kibana project doesn't support the geo_shape aggregation yet.
  • IPv4 mapped IPv6 Iceland enhanchment
  • Elasticsearch will not support anymore the Groovy dynamic scripting because of the security concerns. Workaround and compromising with the TimeInterval visualization
Next tasks
Not a full sprint due to absence
  • TimeInterval visualization
  • Documentation
  • Investigate security (authentication and authorization) with Elasticsearch and Kibana probably without Elasticsearch Shield.

Week 9 (20-26 July)

Not a full sprint due to absence from 22 July to 25 July
Status
  • TimeInterval visualization
  • Two SOS admin settings (elasticsearch and geolite) merged into one "Statistics" settings page. Additional information
Problems
  • IPv4 mapped IPv6 Iceland enhanchment
  • Elasticsearch will not support anymore the Groovy dynamic scripting because of the security concerns. Workaround and compromising with the TimeInterval visualization.
Next tasks
  • Creating more Kibana dashboards using the Kibana Importer/Exporter.
  • Securing SOS + Elasticsearch without the licensed Shield project.
  • ...

Week 10 (27 July - 2 August)

Status
  • Created Kibana visualizations and dashboard.
  • New Security.MD and Developer.MD documentation
  • Improved Kibana Importer/Exporter
Problems
Next tasks

Week 11 (3-9 August)

Status
  • Refactoring the Statistics module to integration into project Iceland
  • TimeSeries API inner working
  • Enchancements in Kibana exporter/importer
  • Embedded Elasticsearch various fixes
Problems
Next tasks
  • Demo videos
  • More refactoring modules

Week 12 (10-16 August)

Status
  • Demo video
  • PRs for SOS and Iceland for statistics module integration
  • Kibana improvements new visualizations and dashboards
  • Documentations
Problems
Next tasks
  • Demo videos (re-record)
  • Documentation
  • IPv4 mapped IPv6 Iceland fix
  • Timeseries API integration

Week 13 (17-23 August)

Reduced work-hours because of illness.
Status
  • Demo videos (re-record)
  • Iceland PRs fixes.
  • IPv4 mapped IPv6 Iceland fix
Problems
Next tasks
User stories

The following list and scope of the stories may change in the future if you have any idea or recommendation about the feature please don't hesitate to contact me on the mailing list.

You can find the User stories at this Google spreadsheet.

Topic attachments
I Attachment Action Size Date Who Comment
DescribeSensorRequest.jpgjpg DescribeSensorRequest.jpg manage 70 K 22 Jun 2015 - 06:19 CsabaLestar SosDescribeSensorRequest
Statistics_event_processing.jpgjpg Statistics_event_processing.jpg manage 109 K 31 Jul 2015 - 07:26 CsabaLestar Statistics event processing
burst_2s_interval.jpgjpg burst_2s_interval.jpg manage 31 K 13 Jul 2015 - 10:27 CsabaLestar  
burst_no_interval.jpgjpg burst_no_interval.jpg manage 31 K 13 Jul 2015 - 10:28 CsabaLestar  
discover-mode.JPGJPG discover-mode.JPG manage 118 K 11 Jun 2015 - 16:18 DanielNuest  
elasticsearch-settingspage.JPGJPG elasticsearch-settingspage.JPG manage 93 K 06 Jul 2015 - 06:20 CsabaLestar  
exec-ms-interval.JPGJPG exec-ms-interval.JPG manage 77 K 05 Jul 2015 - 18:43 CsabaLestar execution time of the requests
geolite-settingspage.JPGJPG geolite-settingspage.JPG manage 58 K 06 Jul 2015 - 06:21 CsabaLestar  
kibana-countoperations.JPGJPG kibana-countoperations.JPG manage 91 K 11 Jun 2015 - 16:18 DanielNuest  
kibana-index-setup.JPGJPG kibana-index-setup.JPG manage 51 K 11 Jun 2015 - 16:18 DanielNuest  
simple.jpgjpg simple.jpg manage 33 K 13 Jul 2015 - 10:28 CsabaLestar  
tilemap-kibana.JPGJPG tilemap-kibana.JPG manage 260 K 22 Jun 2015 - 06:36 CsabaLestar Kibana-ui tilemap
Topic revision: r37 - 27 Jun 2016, UnknownUser
Legal Notice | Privacy Statement


This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback