Open Sensor Search

Student: Mohammad Ahmed Hamed Yakoub

Mentors: Simon Jirka (, Daniel Nüst (

Project Description

See OpenSensorSearch.


The project follows and agile software development method, so we do not have a fixed set of tasks. We use Scrum (to a certain degree) with one week long sprints and telcos on Mondays and Fridays.

Weekly Reports

Week 1


We found two bugs in the system:
  1. The Configuration and the system variables weren't set up in a suitable way to allow for writing integration tests.
  2. The insertSesnorInfo method doesn't work as expected and doesn't insert new sensors on calls.

Week 2

This week we concentrated on the data backend , we implemented the following :
  1. Indexing mechanism for the sensors Keywords - Location - BBox center.
  2. AutoComplete HTML+Javascript web page for autocompletion based on a servlet.
  3. JMeter testcases for the following two scenarios
    1. Autocomplete servlet.
    2. Apache Solr search test to be compared to the existing PQSQL implementation.
  4. Unit tests were implemented and run to test the functionalities for all of the previous tasks
  1. The Apache Solr contain a stored index of sensor keywords , locations and BBox center
  2. Verified and successfully run unit tests.
  3. Complete JMeter testcases.
Results for Next weeks
  1. BBox wasn't implemented as a supported datatype for Apache Solr , and so we have to implement it using a position (the center) and modify the implementations accordingly.
  2. JMeter tests weren't so decisive due to the lack of sensor test files , so a set of test sensor is to be generated via a harvesting mechanism or via dummy random generations
All of the commits of this week were implemented under the solr branch of the github repo and can be found at :

Week 3

In this week we completed the indexing and storing of all of the metadata of sensors , to be indexed in apache solr , we added the following fields

  1. gml:description
  2. Classification values from discovery profile.
  3. Valid time
  4. Interfaces values.
  5. Inputs and Outputs and measurements
  6. Identification values
  7. Contacts
For each of the previous fields , there were a unit test , to index ,store retrieve and test the autocompletion.

To solve the problem of lack of sensors we generated sensor by two ways

  1. Dummy test Sensors : generated randomly , depend on the site ( , you can check the test data file at ;
  2. We configured the Harvest service to insert sensors into the Solr backend as well .
  3. We run some tests on JMeter to compare results.

  1. The harvest service didn't work because it depend on validating the sensors which returns invalid for all sensors with no classifications.
  2. The JMeter tests were required to be modified to test them against the ordinary SIR interface.
TODOs For the next week

  1. Include the SIR interface in the JMeter test to test the PGSQL backend vs the Apache solr one.
  2. The Autocompletion should include any combination of indexed fields , at any order.
  3. The test sensors datasets should bigger ~ 1000 sensors for good comparisons

Week 4

  • In this week we concentrated on the test performance of the OSS with the Solr and PGSQL backends , namely we used JMeter for the following tests
  1. OpenSearch binding with Solr Backend.
  2. OpenSearch binding with PGSQL Backend.
  3. Autocompletion of OpenSearch performance test.
The results can be found here
  • We implemented the InsertSensorInfoRequest to allow adding of inserted sensors in both the Solr and PGSQL backends.
  • We implemented the dummy sensor generation based on tempelate for testing purposes
  • We implemented searching by all indexed fields for autocompletion and search purposes
  • We added to the implemention of the XmlListener , that returns the results of the sensors if they contain persistend data fields.
  • We allowed the autocompletion to use multiple word for different indexed fields for a google UI like autocompletion.
  • The harvestSensor is still not working , the server log indicates a problem related to the SirConfigurator.
  • The Listeners doesn't support all of the different data representations for sensor data (JSON - feeds , HTML...).
Tasks for next weeks
  • Work on the harvest sensor bug.
  • Implement the different data types listeners.

Week 5

In this week we did two important steps in the implementation of the OSS:

  1. Merge the two development branches of me and Daniel , so that we can have a shared codebase , and so that I can share his implementation of Sensor unique Id generator.
  2. Implement the temporal and spatial search so that it complies with the temporal and spatial search specifications for the open search.
1) For the merge part , I merged my solr branch with the local master and merged them with Daniel's master that included the following modifications

  • Unique Sensor ID generator.
  • License header modifications.
2) The second part included the implementation of the following parts

Other tasks :

Notes for next weeks

Tasks for next weeks

  • Start working on the Harvesting mechanism.
  • Implement the listeners for HTML , Feed and KML for the OSS interface

Week 6

In this week the work was included in the following themes
  1. The harvesting mechanism.
  2. The sandbox of the harvesting scripts.
  3. Harvesting scheduling.
  4. Dependency Injection using Guice
1) The harvesting mechanism
  1. For the harvesting we implemented a utility class using Rhino that allow javascript code to call Java classes , so that developers can implement their scripts in javascript , the class was implemented and tested using test , the implementation took care of the for the DI of guice.
  2. The DB script was modified to allow the harvest scripts to be persisted for later use , it persisted the values of the User , scriptId , version ,lastRunTime and last update , A DAO was implemented accordingly.
  3. A form was implemented to allow the users to upload their scripts for harvesting.
  4. The harvesting used a restful interface depending on apache Jersey and Guice
2) The Sandbox of harvesting

To protect the server from the scripts , sandboxing using class shutters of rhino allow to restrict the classes that a script can access , a class was implemented for that functionality and allowed the access only to the Insertion DAO.

3) Harvest scheduling

The scheduling depend on using quartz scheduler which lets the server harvest scripts at given times by implementing jobs , The listener allowed the initialization of the service at server startup , the user were allowed to access routes to let their sensors harvested at certain times using restful routes , the quartz functionality is not yet complete and need some modifications.

4) Dependency Injection using Guice.

The DI using guice was the most important part of the implementation to allow better testing and to inject all the dependencies , the listener was implemented and the Guice configurations was implemented to run at startup

  • The configuration of Guice with tomcat was a little bit badly documented and required much search to get the right type of configurations.
Tasks for the next week
  • The sandbox allowed only restriction of classes , we need to extend it to restrict method access to even for the java standard library.
  • The scheduler is not fully implemented and is required to be tested by unit and Integration tests.
  • The harvest scripts DAO is needed to allow search of script by Id and updating script version.

Week 7

  • In this week we worked more on the harvesting mechanism , mainly we implemented different harveseters for different sensor data sources:
  1. Adding sensor manually using the sensor data structure i.e. adding manually the sensor data , implemented in Javascript
  2. Adding sensor data harvester for sensors stored in SmartCitizen platform
  3. Adding sensor data harvester implemented in Javascript for sensors stored in thingsSpeak platform.
  4. Adding sensor data harvester for sensors stored in an OWS service.
The problem we get was to try to allow developers to use platforms like JQuery and other JS platforms , while trying to use platform like env.js , there were a lot of bugs and we couldn't install it properly so we did an HTTP utility class to be called by JS scripts , all of the previous tasks were tested using unit and integration tests. Challenges
  1. The ability to use Platforms like JQuery wasn't succesful and wasn't implemented properly and we used java code instead for HTTP request tasks.
  2. The quartz mechanism wasn't tested properly ( check if the sensor was harvested at the indicated time) .
Tasks for the next weeks
  1. Search for an alternative way to use JQuery and other JS frameworks or use a different harvesting mechanism.
  2. Implement the scheduling mechaism unit and integration test properly to make sure that they work as expected.

Week 8

In this weeks we worked on the following tasks
  • Quartz scheduling testing : A unit test was implemented to make sure that all of the units involved in the implementation of the scheduling work as expected including the binding , the job and the DAO .
  • The alternative harvesting mechanism was implemented completely , including the restful method implementation , the harvest job and the DAO.
  • An initial unit test was implemented for the alternative mechanism.
  • The unit tests used a mock test using the wiremock library.
  • The initial UI for the web interface was implemented using the twitter bootstrap as a UI library .

The initial UI that will be used for the whole harvesting web app

  1. The unit test needs to be rewritten as most of the implementation of the units was not modularized as neccessary as possible.
  2. The jersey framework responses were always chunked i.e. sent in stream , which can't work with ajax , I had to sent content-length as header manually.
Tasks for the next weeks
  1. Complete the implementation of the UI.
  2. Enhance the non-well modularized units and their unit tests.
  3. Use a framework (MVC - Play - Spring ,.. ) for the implementation of the UI , especially authentication and views.

Alternative harvesting Techinque

  • The essential harvesting mechanism is heavily language-dependent , for example in week 7 we used many libraries and third party components just to allow developers to harvest their own sensors using Javascript , while successful this solution needs to include all the languages to make it accessible to wide range of developers with different skills and for that we need a more general solution than implementing an interpreter for each language in OSS
  • Another alternative that's dependent upon a restful mechanism is explained in the diagram as follows:

This figure explains an alternative mechanism for the harvesting mechanism.

The mechanism works as follows
  1. The harvest script developer develops his server which is a restful web services with a certain specified routes.
  2. The harvest developer makes a POST call to the OSS , with the URL of the webservice.
  3. The OSS returns the Script Id and an auth_token to the user , the Id is to identify the sensor and the auth_token to allow his access over it.
  4. The OSS then makes a harvest call to the server and returns the result
  5. The user can then make GET , POST and DELETE calls to the OSS using his Id and auth_token to update , reharvest and delete the script.
The harvesting mechanism can support the following methods on the OSS side
  • POST /harvest/url=... : Let's OSS to harvest a URL , returns the ID and Auth_Token
  • POST /harvest/ : Reharvest an inserted script , takes one parameter , the ID.
  • DELETE /harvest : Deletes a script from the scripts list , takes two parameters ID , Auth_Token.
  • GET /harvest/:id : Checks the state of the sensor harvesting process - pending , successfully harvested , failure on harvesting
  • GET /harvest/:id/status : A description of the process output.
  • POST /harvest/schedule , params : id : script Id , date : time to harvest : do a scheduling , returns a Job Id
  • GET /harvest/job/status/:id : Get the status of a scheduled job/
The developer role is to implement a restful web service with the following route(s):
  • GET /sensors : retrieves a list of json/xml encoded array of sensors , the developer can have his/her data source from different places as indicated in the diagram above (DB - Custom sensors - implement an HTTP call to harvest platforms like smartCitizen - thingsSpeak ,.. and so on)
  1. Allows the user to develop his harvesting script in any language and be able to add his sensors to the OSS.
  2. Allow the harvesting mechanism to be decoupled , so that we implement a thin layer for harvesting only on the server and all the other work of interpreting and loading another language code into java code is avoided which is insecure and inefficent also not very stable.

Week 9 & 10

In that week we completed work on the OSS-UI for uploading both scripts and remote servers using Spring MVC , We implemented a complete authentication + permission level system , the following screens were implemented and tested
  1. Login with successful / fail indication
  2. Remote sensor upload URL
  3. Remote sensor schedule harvesting
  4. Upload a javascript file
  5. Schedule a javascript harvesting.

Fig - Sign in window


Fig - Wrong authentication


fig - Remote server upload4.jpg

Fig - remote server with auth_Token returned


Fig - Upload a javascript file


Fig - script uploaded msg

  • The two harvesting jobs : Remote harvest and script harvest were rewritten to be unit testable.
  • Unit tests for the previous harvest jobs were implemented.
  • Daniel worked on many parts of SIR , changing it's configuration to Guice and it's name to OSS , he made a pull request , I later merged the harvestCallback branch with his master to get a shared codebase.
  • Authenticaion method need to be implemented for the script uploading so that each user can access only his/her sensors , the method of Cross-domain policy will not be helpful in that case.
  • The scripts need to include license headers.
Tasks for the next weeks
  • Authenticaion method need to be implemented for the script uploading so that each user can access only his/her sensors
  • Implement useful API for users to be able to validate and convert their sensors

Week 11

In this week we concentrated on the OSSUI , we implemented the license agreement checking and automatic license appending to the uploaded script


Fig - How the license agreement works in the OSSUI

2 - The User access DAO was enhanced to allow auth token authentication for all scripts , A user access resource was implemented and tested : The scenario works as follows
  • The user logs in using username and password.
  • On successful authentication an auth token is returned.
  • Whenever the user needs to access a restricted access resource he uses the auth token , the OSS then checks it and see which user matches it and sees if the the user is allowed to access the resource or not
3 - The Harvest resources and the DAO was updated to allow the authentication schema mentioned in point #2

4 -Two public api methods were implemented
  • validator , to validate a given SensorML document.
  • Converter , to convert a sensor to a given format (json - ebrim)
  • The IT was implemented and run for the validator.

Week 12

In this week we concentrated on the finalization of our tasks and listing of the future tasks
  • The whole Project webapp was written using twitter bootstrap v3.
  • The content assist was implemented according to the guidelines of OpenSearch specifications.
  • The autocomplete was inserted in the OpenSearch Document.
  • The client webapp was validated using HTML validator , it was tested among major browsers : Mozilla Firefox , Google chrome and IE.
  • A search nearby mechanism was implemented that allow search by near location depending on GEOLOCATION API of the browser.
  • The API documentation was implemented using annotations and Swagger UI framework.
  • A Github Page was implemented containing the most important developer documentations of the project.
  • A script view page was implemented that allow user to view and list scripts.
  • Registeration form and mechanism was completely implemented using admin permission for user validation.
  • The IT was revised to see which tests need to be written in future releases.
  • Few bugs were found and reported at the Github repo issues , the most important ones were
  1. The auth_token wasn't sent correctly in the headers of requests from the OSS-UI
  2. The response from the jersey webservice was always chunked which didn't work well with Ajax Calls

Apache Solr vs PQSQL backend

One of the most critical aspects of the OSS interface is speed and high performace , for this we need to have a reliable backend that's reliable , with high performance and speed , to make sure that indexing the data in the Solr backend adds to both the speed and performance we performed some tests using the Apache JMeter tool.

The test environment contained the following

  1. A set of 1000 sensors randomly generated test sensors that were inserted in both the PGSQL and Apache Solr using HTTP Request (
  2. The OSS was deployed to Apache Tomcat 6.0
  3. A search query was used to make search once for PQSQL , once for Apache Solr , with 100 calling sensor , the time for each thread was recorded.
  4. The results set data format was XML data format.


Fig 1 - The test performace of OSS with the PGSQL backend


Fig 2 - The OSS test performace with the Solr backend

As the results show

For the OSS binding with Solr :

A total number of 100 threads requesting HTTP requests took around 1843 ms

Throughput 2443.992 operation / minute

For the OSS binding with PGSQL:

A total number of 100 threads requesting HTTP requests took around 4665 ms

Throughput 1018.676 operation / minute


While the results show a great performance enhancement of Solr over PGSQL , the results are not decisive yet , there are few things to add to the next tests

  1. The dataset was randomly generated , we need a real data sets harvested from remote sensors to test against.
  2. The results were shown only for the XML data response format , what about other formats and outputs (HTML-KML-JSON-...)
  3. The results were tested on a local machine , what will be the results when both are accessed via a remote call.

Autocomplete Test Performance

The autocomplete is a crucial part of the search engine , in OSS it suppors a Google-like autocomplete with many words related to different fields , the autocomplete needs to be very fast becuase it runs in realtime , We made a test using the Apache JMeter on a set of samples of 100 threads on a set of 1000 dummy sensors generated as in the previous section , the results were like the following


Fig 3 - Test performance for the autocomplete servlet

As we can see , using a number of 700 sample took 928 ms a throughput of 1687.628 operation/minute which is less than the Solr backend search process throughput but can be attributed to the maniupulation of result sets in the java implementation of the autocomplete servlet , which shows that the throughput is not that bad.


I'm Mohammad Ahmed Hamed Yakoub , I'm a Computer Engineering premasters student , I'm interested in software development in general , I mostly use java , I hope to implement the OpenSensorSearch Idea and make a successful and productive summer at GSOC2013 here at 52°North

Original Project Idea

Explanation: We want to take sensor data in the web to the next level by implementing the one stop shop for finding sensor data. This project fights this battle at two fronts: First, the existing 52°North Sensor Instance Registry (SIR) implementation must be made more open: Registered users should be able to request a harvesting of their data source as well as provide their own harvesting implementation to integrate their metadata into the catalog of sensors. The potential student would also implement such harvesting mechanisms for popular sites such as Cosm and Thingspeak.
Second, the database interface must be switched out to something that scales and is really (!) quick, such as Apache Lucene and/or commercial cloud storage. The infrastructure should be tested with a simple search form with a good auto-suggest search field - you could call it "Google for Sensor Data" :-).

The student should have some experience in Java and an interest to scrape data from the net and put it in a high performance data structure. Experiences with Apache Lucene, XML and Database indices as well as cloud infrastructures are great but not mandatory. The complexity of this project itself is quite scalable.

Expected results: Scalable database backend for 52°North SIR, user management for open sensor search, public harvesting API with harvesting configuration UI for open sensor search.

Community and Code License: Sensor Web, Apache 2.0

*I hope the development cycle goes like the following kiss
  1. In the first weeks I hope to understand the code , enhance and test it via Unit and Integration tests and Unit tests of all the specifications and details of the software.
  2. Later on I hope to make a concise comparison between the products that can be used be used to implement the backend search engine , and make sample tasks and tests to settle that decision.
  3. Later after we settle those decisions I hope to implement the system as specified above , and make a general integration and unit testing , then I hope to work on enhance the UI and extend the capabilities of the system.
  4. I hope we can implement this system via an agile methodolgy (specifically : Scrum framework) and apply Test driven development for best utilization of time and resources and higher quality
Topic attachments
I Attachment Action Size Date Who Comment
1.jpgjpg 1.jpg manage 24 K 28 Aug 2013 - 16:13 MohammadYakoub  
2.jpgjpg 2.jpg manage 38 K 28 Aug 2013 - 16:13 MohammadYakoub  
3.jpgjpg 3.jpg manage 64 K 28 Aug 2013 - 16:13 MohammadYakoub  
4.jpgjpg 4.jpg manage 83 K 28 Aug 2013 - 16:13 MohammadYakoub  
5.jpgjpg 5.jpg manage 67 K 28 Aug 2013 - 16:13 MohammadYakoub  
6.jpgjpg 6.jpg manage 35 K 28 Aug 2013 - 16:13 MohammadYakoub  
Screenshot_from_2013-08-19_213709.pngpng Screenshot_from_2013-08-19_213709.png manage 55 K 19 Aug 2013 - 19:45 MohammadYakoub  
SirBindingPGSQL.jpgjpg SirBindingPGSQL.jpg manage 51 K 12 Jul 2013 - 11:53 MohammadYakoub  
SirBindingSolr.jpgjpg SirBindingSolr.jpg manage 49 K 12 Jul 2013 - 11:53 MohammadYakoub  
alternative.pngpng alternative.png manage 13 K 11 Aug 2013 - 14:44 MohammadYakoub  
alternative_1.pngpng alternative_1.png manage 23 K 12 Aug 2013 - 18:40 MohammadYakoub  
autocomplete.jpgjpg autocomplete.jpg manage 88 K 14 Jul 2013 - 15:01 MohammadYakoub  
license.jpgjpg license.jpg manage 89 K 03 Sep 2013 - 20:06 MohammadYakoub  
Topic revision: r22 - 16 Sep 2013 06:15:53, MohammadYakoub
This site is powered by FoswikiCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback