Weekly Reports

Back to GSoC2012 Projects page (On-demand transformation of OSM Data to common GIS formats).

Weekly Report 13 (August 20th)


This has been a week of primarily last-minute bugfixes and documentation. Everything seems to be working and complete. Regrettably I didn't get to work on the UI component as much as I would like, which is pretty unfortunate. At this point though, I think it's safe to refer to the main page for some other comments and thoughts for future directions on the project.

I definitely want to continue working on the project, although probably won't get as much time to devote to it. I also have some ideas for implementing some of my previous projects with WPS, as I understand it a bit better and feel that it may be better suited to running experiments on it. I also now have the coupling with OSM files and the Overpass API at my disposal.


There aren't really any problems to report at this stage. Lots of future thoughts for potential directions on the project though.


Final packaging.

Perhaps ask Daniel/Benjamin if I could continue working/maintaining the project using the 52north system, and if not, I might fork to github

Weekly Report 12 (August 13th)


Finally got a chance to do more work than I have been doing over the past two weeks. So far I have been able to:
  • Convert OSM internal type to SimpleFeatureCollection
  • Confirm some GML output using GML3BasicGenerator of OSM data.
    • My approach to tags for conversion to GML is to just set all tags as attributes. This seems to work reasonably well, and mitigates situations where OSM data is not well-defined (i.e. missing corresponding tags, etc.). I do not know what it does in the case of duplicate tags. It also appears to do this reasonably fast.
  • Implemented some helper functions to allow for easy querying the Overpass server. Still need to test these to ensure data flow works correctly


There are some problems with the Generator recognizing that there are Points, LineStrings, and GeometryCollections in the data; it is just assuming that the main geometry is Point, and just going with that. Options are to either produce 3 different GML outputs, or to perhaps classify everything as a MultiLineString. Right now I'm going with the latter option.

In the 52n-wps-io package, GTHelper.java builds a SimpleFeatureType based on the first feature in the FeatureCollection. It also ignores any attributes that are set to null in that first feature, when building its own SimpleFeatureType. This causes a lot of tags to be missing in the final GML output. Removing the non-null attribute restriction seems to fix the case in my situation, however I do not know if this affects any other functionality in WPS (diff file here: https://wiki.52north.org/pub/Projects/GSoC2012ProjectsOnDemandTransformationOfOSMDataReport/gthelper.diff).

Alternatively, I can simply set the first feature of the FeatureCollection to populate all the attributes so that they are recognized, but this seems like a terrible thing to do.


Unit tests, documentation, bugfixes.

Need to test KML output

Weekly Report 11 (August 6th)


Did not get too much done this week due to other priorities this week (research meeting presentation, final exam proctoring/marking).


None this week.


Finish project in time for soft pencils down date.

Weekly Report 10 (July 30th)


Good week at DEFCON. On the way back to Waterloo (stuck on a full day of traveling).


None this week.


None this week.

Weekly Report 9 (July 23rd)


Reached a decision regarding the parser problem; for it to properly be converted to GML (or KML), it has to return a FeatureCollections object. Therefore I have split up the functionality of OSMParser to the following:
  • OSMParser.File - simple JAXB parsing of the OSM file format based on defined schema in osm.xsd; returns a GTDataFileBinding
  • OSMParser.Vector - converts the OSM file format into some FeatureCollections object; returns GTDataVectorBinding
I've also been fleshing out some of the algorithms for Overpass. Have not made a lot of progress on this though, unfortunately.


None this week.


This week marks the start of my break for DEFCON. There probably won't be an update next week. Or a minimal one at best.

Weekly Report 8 (July 16th)


Today is supposed to be the UI milestone, but that only partially happened due to the fact I have been working on how to organize the backend. This somewhat affects the overall interface interaction anyway, since the interactions with the UI is reliant on the WPS and the backend working properly.

After some talk with Benjamin, and a lot of thinking, I have decided to organize the backend this way:
  • There will be two process repositories: OSMProcessRepository and OverpassProcessRepository. The reasoning behind this is that I realized the functionality and interaction for both are very different, enough to merit them being separated.
    • OSMProcessRepository: manages algorithms that interact with OSM files. Example algorithms right now are mostly conversion algorithms:
      • OSM2GML - Converts OSM data to GML data; input = OSM content, output = GML content
      • OSM2KML - Converts OSM data to KML data; input = OSM content, output = KML content
      • There should not be any need for additional parameters here. Currently I'm leaving out the capability for detecting and processing gzipped OSM data for simplicity purposes, and may add this as time permits.
    • OverpassProcessRepository: manages algorithms that interact with the Overpass server. Basically algorithms in this repository send customized queries to the server, and let the server do most of the work, and then return the output in either GML or KML or OSM format. Example algorithms:
      • GenericOverpass - Sends some query by the user to the Overpass server; inputs = query, format enumerator (OSM, GML, KML), output = result from server in specified format
      • BBox - Basic bounding box; inputs = bounds, format enumerator (OSM, GML, KML), output = result from server in specified format
      • Additional parameters for this Repository should include the Overpass server that is to be queried
All the algorithms presented extend AbstractSelfDescribingAlgorithm, and the Repository classes essentially model that of LocalAlgorithmRepository. This was probably the simplest approach to go with after some discussion with Benjamin, but I haven't yet tested how well this works, and do not yet forsee any possible issues with this approach.

I have also branched the 52n-wps-webapp project to test my changes on the test pages, in addition to the configuration. The changes here are minor, and may eventually be merged into trunk at some point in the future.


The only potential problem I forsee now is how well the parser automatically detects the incoming data, and interacts with the Repository classes and the Algorithm classes I have made. I haven't yet done much testing, only been coding up the classes and doing a lot of reorganization. I suppose I will find out how well it all works this week.


Testing, testing, and more testing. Also, upload a basic UI with buttons that interacts with WPS, instead of using the testing page in 52n-wps-webapp.

Additionally, reach a decision for the parser, whether it does really basic parsing, or does more work converting to FeatureCollections.

Also, finish writing the custom algorithms. In a way, the post-processing stage will accurately start this week, as at this point the project is going to start coming together.

Note for next week: I will be taking a semi-hiatus on the project as I will be away at DEFCON. I might be available for the early part of the week and do primarily testing, code review, and code cleanup, but the project report for July 30th will be relatively sparse

Weekly Report 7 (July 9th)


Not much to report this week. Have been mostly reworking the package, and seeing how best to organize and test it. The dependencies have been resolved, so I believe the package now properly builds.

Right now the project has started to be reworked to model one of the existing backends, such as the GRASS or Sextant backends (the two I'm currently looking at right now). I've checked out a branch of the 52n-wps-webapp to modify the config and the test.html to include the OSMRepository, OSMAlgorithm, etc. It seems the most common approach is set up such backends is to have the following:
  1. Set up a ProcessRepository (extending IAlgorithmRepository); register this in the wps_config.xml
    • In the constructor, do some basic initialization
    • Gather any Property(s) that are set active (done in wps_config.xml)
    • Initialize the ProcessDescriptionCreator
    • Loop through and create a ProcessDescriptionType for each algorithm available
  2. Set up a ProcessDescriptionCreator
    • Provide functionality to create ProcessDescriptionType for a specified algorithm
    • Handle necessary input and output into the algorithm
  3. Some generic object extending IAlgorithm or AbstractAlgorithm to handle the actual process execution
I am a little fuzzy as to how interaction between the ProcessRepository or ProcessDescriptionCreator with the generic algorithm object occurs. Presumably the former two are mostly concerned with setup, and the generic algorithm objects are called from WPS itself.


  • The approaches to setting up the Sextante and GRASS backends seem not very similar. I might investigate standardizing the approach when writing the OSM backend
  • I'm uncertain of the best way to test the workflow of all these WPS interactions; understanding how this works would also aid in the creation of the UI.


TODO this week is to get some clarification on the workflow, and continue writing the backend.

Weekly Report 6 (July 2nd)


This week's status report is primarily to document the discussion with Benjamin regarding the progress of the project, and to outline changes in how the organization of the package will work.

UI Considerations

There were two possibilities for the UI component: see if we could extend the OpenLayers client, or to come up with a simple UI to demonstrate the capabilities with OpenStreetMap. We decided that extending the OL client is beyond the scope of this GSoC project, so we will be working on the simple UI as planned. The mockup is already complete as seen from last week, so progress so far has been implementing that. Likely this is going to incorporate parts of OL in general, mostly to display the map and preview portions.

Project changes

There was some initial confusion of how the workflow of logic worked with regards to input and processing of OSM data. I had put the primary component of the OSM preprocessing in the Parser, which was initially part of the geoprocessing io component. However, the Parser takes in an InputStream, which means all the setup for the input is done by the time it reaches the parser. There was also the other problem of how communication with Overpass servers should be done, whether it should be done with the Algorithm or AlgorithmRepository, or not. We decided the best way to treat this was setting the project up not as a standalone plugin, but rather as a backend, similar to the Sextante or GRASS backends. Benjamin suggested I look at the GRASS backend as an example for setting this up.


Currently the project is standalone and does not really interact directly with WPS in any way.

Currently there is no way to set up the inputs or configuration into the pre-processing component.


  • Restructure project to follow one of the existing backend projects
  • Add missing dependencies as mentioned by Daniel
  • Set up an option to configure inputs

Weekly Report 5 (June 25th)


As it turns out, there were a number of tweaks that were needed in order to get everything to talk to each other properly. These tweaks are going to be documented here in the hopes that others encountering the problems I did can resolve them easily.

Issue 1: Solving last week's problem with dependencies

Turns out the main issue was that my schema was depending on additional files (I split the schema up for clarity), and thus the parser could not find these additional files. There were two ways to solve this:
  • Make one big schema (the approach I initially did)
  • Make sure your schema file(s) reside in the directory corresponding to your destination package when you run the codegen (the better approach
    • Example: the destinationPackage in your pom.xml for xmlcodegen is org.geotools.osm ; therefore the schema needs to reside in src/main/resources/org/geotools/osm
I did approach 1 in order to solve the dependency issue, then by pure luck discovered approach 2 when trying to prevent hardcoding the schema location in the OSMConfiguration class.

With this solved, I encountered a new issue where my sample OSM file was being parsed into a HashMap, instead of OsmType.

Issue 2: Geotools Parser returns a HashMap instead of OsmType

According to this, this issue occurs if the schema is incorrectly configured. Apparently my schema was still not properly set up to allow for the proper mappings/bindings to happen.

Turns out, my schema definition/initialization was not correct. I use xs: to prefix all the tags, so needed to reference the appropriate xsd's for those. Eventually, the schema tag looked something like this before the proper casting took place:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
  version="0.6" targetNamespace="http://www.geotools.org/osm"

If you use xs: to prefix all the tags, you will need to set the xmlns:xs appropriately, with xmlns set to your namespace.

Issue 3: Invalid casting for dateTime type

I originally used the recommended XMLGregorianCalendar to cast dateTime types (due to the OSM attribute timestamp occurring in the schema for the node, way, and relation tags). This worked fine on my small osm sample xml, but failed for a larger osm sample xml. I don't have the original stacktrace from that, but I believe for some reason it got cast to a java.sql.!DateTime instead. Either way, my solution to this was to soften the restriction, and cast dateTime types to Strings for now. The unfortunate thing is that the stacktrace did not display the line number which had offending attribute, so I am uncertain where the problem was in the xml file. This approach (i.e. softening the constraint) is probably better anyway, since we cannot be completely certain that osm xml files will conform, and the timestamp attribute is minor in the big scheme of things (versus say, longitude/latitude points, etc.).

XSD Creation/Validation Tools, XML-Java Binding Tools

Daniel asked me what validation tools I was using to test my schema against. So, here's a section on what I used for XSD creation and validation, and also the bindings generation.

References for schema:
  • XSD Reference at W3C
  • Incomplete XSD for OSM
  • OpenStreetMap wiki pages in general, describing the specification
  • I only found out after writing the core of the schema, that osm2nds has a schema also. Oops. It might be worth doing a comparison to see if I missed anything.
Tools for schema generation and validation:
  • xmlbeans inst2xsd, which attempts to infer a schema from an XML document. I ran this against a few sample OSM files to get an idea of their structure (in case it picked up things that were not present in other incomplete references for OSM data
  • xmlbeans svalidate, which attempts to validate an XML document against a specified schema. This would apparently fail for large files.
XML-Java Binding Tools
  • mvn jaxb2:generate
  • mvn install
  • mvn org.geotools.maven:xmlcodegen:generate
Source: Geotools XML Tutorial. See notes above about the issues I found, and how I resolved them. Also important to note that their sample files provided for download are not necessarily up to date (and the article itself doesn't provide the full contents of the sample files), so it's hard to really get an idea of how their sample project works. There is also no sample po.xml (for the PurchaseOrder) that you can really test against.

UI Mockup

So June 18th marked the start of the UI phase of this project. I haven't ignored that aspect while working on finishing the data preprocessing component. In fact this is the perfect time to raise logic considerations in terms of datasources. For example:
  • When implementing the parsers for the OSM data, I realized that the input is in the form of InputStream. This means the configuration information for input sources and the I/O for input sources is set elsewhere that I am unaware of. At the same time, this might become some setting that is configurable by the graphical ui
  • When testing to see if the OSM data was loadable, especially for significantly sized OSM data, it could potentially make WPS take a considerable amount of time to process (at least it took a long time for Geotools to process... although most of it was probably I/O). There should probably be fail-safes built in to handle these cases, perhaps some sort of limit on the size of OSM data that is being uploaded (rather, more of a limit on the size of FeatureCollections that are being built as a result). Interacting with the API should not necessarily have this problem
UI Rough Mockup

Initial mockup is as follows:

UI Workflow Diagram

A proposed workflow diagram is as follows:

UI Mockup Diagram


Also, was able to successfully test parsing a 172MB file (OSM XML file of the city of Victoria, BC, Canada).

I also am trying to make wiki-like words to not act like wiki links in this report, as that is a little bit frustrating. When I get time this week, I'll probably go back through the reports and fix these minor issues. This will be marked as an ongoing TODO


Most of the problems currently are logistical issues as mentioned above. These will probably be resolved in the correspondence with Benjamin.

Additionally, should work on the workflow between the preprocessing component and the ui component, which also needs to be discussed this week.

Additionally, should determine if the default input stream is gzipped or not (or figure out a mechanism to auto-detect this)


  • Write test cases for preprocessing phase
  • Benchmark preprocessing of OSM format, both from file and from API
  • Start writing the UI component

Weekly Report 4 (June 18th)

This is a bit of a late report, due to the impending problems that I encountered and attempted to resolve, without any success.

The background is that I intended to get a working preprocessing component by today (i.e. meet the milestone). This unfortunately did not happen, with the details in this status report. As I currently am lacking the inspiration as to figuring out where the problem is, I figure the better solution is to document where I am at currently, with the intent of finding and fixing the problem over the next few days. This encroaches upon the time period for the UI component, but perhaps I can start working on that component first while figuring out how to fix this issue.


In the previous status report, I noted that I thought I was wrong in thinking that the Geotools Java files were generated using JAXB, and instead were generated using xmlcodegen. The truth of it is that, the Java files are generated using both xmlcodegen AND JAXB.

Before I continue on with this discussion, some questions need to be answered as to the motivation of my approach.

Why are you putting all this work in trying to create XML-binding Java files?

Two reasons: (a) I initially thought this approach would be easier (turns out it is not), and (b) I figured if I created a schema and bindings that directly interfaced with Geotools, in the long run this would be easier to maintain should the OSM specification change, instead of a big mess of spaghetti code handwritten to manually parse the OSM schema.

Okay, so why not just regular JAXB? This is better than handwritten XML parsing code, and a simpler solution. Even osm2nds uses it.

As mentioned in the previous answer, the key component is the bindings that interface/connect with Geotools. What I had hoped to achieve was that in the OSMParser class, I could simply create a custom variant of the Configuration class (like GMLConfiguration) called OSMConfiguration, pass this to the Geotools Parser, and very trivially we'd have something that parses OSM data into a FeatureCollections (which then gets passed into GTVectorBinding, returned as IData, etc.) This would mean the core of the conversion work would be complete once you just loaded the OSM data (either through Overpass API or xml files)

Isn't there an ApplicationSchemaConfiguration that Geotools provides for parsing custom XML provided you have a schema available?

Yes, and no. ApplicationSchemaConfiguration works for GML derivatives, i.e. user-defined custom extensions to the GML specification. OSM is not based on GML, therefore would not work very well using ApplicationSchemaConfiguration.

Okay, so what did you really achieve then this week?

I had been focusing on using this article for a while, and was confused why xmlcodegen would continually fail on my schema. Then I realized I needed to create specific Java classes representing the different aspects of my schema (done by using JAXB), before running xmlcodegen, as detailed here. Additionally, there is this page which talks in more detail about the Configuration class in Geotools, which I will have to read about more to diagnose where I'm going wrong. A small maven project that compiles Java classes binding with the OSM schema are available in SVN separate from the osmtransform project, under osmtransform/trunk/xsd.

It is also important to note that OSM is a very different format than GML. Geotools bases its FeatureCollections class almost directly on the model of a collection of Features as defined in the GML specification. OSM however, does not. Therefore mappings between the primitives of both formats need to be created.

Luckily the OSM format is much much simpler than GML, so some basic mappings can be formed:
  • node maps to a Point, which is a geometry object
  • way maps to a Linestring? which is also a geometry object
  • relation maps to some complex geometry object, as it can be referencing a way, or a node, or other relations
  • tag maps to meta-features, as a tag contains a key (which denotes the feature type), and a value (which denotes the feature name)
  • all other attributes that are not related to geometry or id referencing maps to attribute objects; some common attributes would be user, uid, timestamp, visible, version, changeset; not all of these attributes will be useful in GML
There should be some detailed diagram later that should explain all of these relationships, but that will be a TODO, perhaps for documentation

I should also add that the osm schema I have written as it currently stands seems to be valid XSD, and at least validates the simple sample1.xml file I have. I should test it on other OSM files though, that are much larger and more complex. The current tools available for this fail for the especially large files, so my tests currently are limited to whether my OSM parser can parse it or not.


So I get this wonderful error now when I try to use my OSMConfiguration class:
Exception in thread "main" org.picocontainer.defaults.UnsatisfiableDependenciesException: org.geotools.osm.bindings.OsmTypeBinding has unsatisfied dependency: class org.geotools.osm.ObjectFactory among unsatisfiable dependencies: [[class org.geotools.osm.ObjectFactory]] where org.picocontainer.defaults.DefaultPicoContainer@7ed75415 was the leaf container being asked for dependencies.
   at org.picocontainer.defaults.ConstructorInjectionComponentAdapter.getGreediestSatisfiableConstructor(ConstructorInjectionComponentAdapter.java:191)
   at org.picocontainer.defaults.ConstructorInjectionComponentAdapter$1.run(ConstructorInjectionComponentAdapter.java:210)
   at org.picocontainer.defaults.ThreadLocalCyclicDependencyGuard.observe(ThreadLocalCyclicDependencyGuard.java:53)
   at org.picocontainer.defaults.ConstructorInjectionComponentAdapter.getComponentInstance(ConstructorInjectionComponentAdapter.java:248)
   at org.geotools.xml.impl.BindingLoader.loadBinding(BindingLoader.java:80)
   at org.geotools.xml.impl.BindingLoader.loadBinding(BindingLoader.java:60)
   at org.geotools.xml.impl.BindingWalker.visit(BindingWalker.java:130)
   at org.geotools.xml.impl.TypeWalker.walk(TypeWalker.java:41)
   at org.geotools.xml.impl.BindingWalker.walk(BindingWalker.java:156)
   at org.geotools.xml.impl.ElementHandlerImpl.startChildHandler(ElementHandlerImpl.java:317)
   at org.geotools.xml.impl.ElementHandlerImpl.startElement(ElementHandlerImpl.java:196)
   at org.geotools.xml.impl.ParserHandler.startElement(ParserHandler.java:614)
   at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
   at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
   at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
   at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
   at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
   at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
   at org.geotools.xml.Parser.parse(Parser.java:238)
   at org.geotools.xml.Parser.parse(Parser.java:166)
   at org.n52.wps.osmtransform.io.OSMParser.parse(OSMParser.java:46)
   at org.n52.wps.osmtransform.ScratchTest.main(ScratchTest.java:28)

This error message is not terribly descriptive, but I feel that perhaps the current generated version of my XML-binding Java classes might have some unsatisfiable dependencies. I wonder if this has to do with the fact my schema is split up into several files, and it is not resolving correctly. I'll have to dig further into how Geotools currently creates custom Configuration classes in order to resolve this issue. Currently this does not occur with ApplicationSchemaConfiguration, so it is definitely a problem with my code generation.

I also currently test my code without it being in the full WPS. This is not a good approach for the final stages of this phase, so, at some point I will need to write proper tests that test it while it is part of WPS and while WPS is running.


  • Fix this problem
  • Start work on UI component; talk to Benjamin
  • Deploy pre-processing component
  • Look into tests that combine pre-processing component with WPS

Weekly Report 3 (June 11th)


So apparently I was wrong in thinking that the Geotools Java files were generated with JAXB from XML schemas; instead they are generated using xmlcodegen which is part of the Geotools package. As such, I've been trying to track down a well-formed schema of the OSM XML and the Overpass API such that they can integrate seamlessly with Geotools.

This proves to be a hard task as the current schema is not too complete, and is inconsistent. As a result, I've been taking other sample XML output for OSM data, and attempting to infer the schema using tools such as inst2xsd, but these are also inconsistent. So now I'm working to compile a sane schema from these schema inferences. However, I think once this problem is solved, the rest of it will fall into place.

This is perhaps the greatest stumbling block for writing the Overpass API Wrapper, thus pretty much still work in progress from last week.


  • OSM XML schemas are inconsistent
  • Inherit problems from last week


  • Update Weekly Report 2 with the outcome of the UI smile
  • Finish the XML schema definition hopefully by the end of this Monday
  • Make the deadline for finishing the pre-processing component which is next Monday
Some notes: UI programming won't be started until after pre-processing component phase.

Weekly Report 2 (June 4th)


Talked to Benjamin over the weekend regarding which option regarding OSM data source. Decided that should aim primarily for interfacing with OSM3S via Overpass API. In addition, might be worth to support OSM/PBF data for small files (where small is defined to be uploadable).

While there exist public OSM3S servers, I'm also looking to set up my own local one to mostly see how well it runs.

Some investigation into Geotools shows that the GML and Shapefile parsers are both XML-based, and use pretty much generated files using xjc (part of JAXB) of an XSD schema to parse the file into something usable by FeatureCollections. Thus the implementation of the OSM(3S) parsers will mirror this implementation. The class files for this will be in a subpackage because this is more of a geotools-like functionality versus a WPS-like one.

Concise version of status:
  • Decided primary datasource should be: OSM3S, and OSM/PBF for small uploadable files.
  • In process of setting up local OSM3S server
  • Better defined organization of osmtransform:
    • Collection of classes that implement Overpass API as AlgorithmRepository
    • Collection of generated classes that translate the OSM file into FeatureCollection based on XSD/DTD schema. Parsers for both GenericFileDataBinding and GTVectorDataBinding intermediate outputs
EDIT: Resolution of the UI problem was thus:
  • Standalone web client for now (probably not directly in WPS)
  • Features include:
    • Upload custom OSM XML or PBF file
    • Some basic filters, i.e. extract just buildings or cycle roads
    • Basic bounding box filter/selection
May or may not integrate/use OpenLayers, and provide additional functionality that is OSM/WPS-specific


I am uncertain how the IParser uses the mimeType and schema parameters. If this can be used to tell whether the stream is bzipped or not, this will make autodetection that much easier. I am also uncertain whether implementing the Overpass API wrapper as an AlgorithmRepository/Parser combination is necessarily a good idea... this will probably be further investigated this week. The issue is mostly that the OSM3S server has some custom parameters before it will return usable XML, so how the parameters are passed is a rather interesting problem.


Write the core Overpass API wrapper. Investigate problem mentioned above.

Weekly Report 1 (May 28th)


Basic project structure is set up.

I have been looking at the OSM data further and thinking in the back of my mind about how to work with the large amount of data with the WPS. The main problem is that the planet.osm data is fairly large, thus will take time to download, and then process. This can be further broken down into two main subissues, namely:
  • how to download/store the data such that WPS has access to it immediately
  • how to modify the datastore quickly such that the WPS provides the most up-to-date information from OSM
The latter issue is easy to solve as planet.osm provides incremental daily or weekly updates of roughly 300MB in size. This small amount of data (compared to 21GB compressed, 303GB uncompressed fully planet.osm data) will be easy and fast to process given average server resources/speed (need to provide benchmarks for this). Integrating this is potentially a problem.

The former issue is more difficult as this is quite clearly the initialization step. In terms of storage, there are several options:
  • Use a db-based backend, e.g. PostgreSQL database with PostGIS extensions installed; There exists osm2pgsql, but it does lossy conversion. There is also osmosis.
  • Use a OSM-native backend. Examples of this are using the Overpass API or other APIs
  • Roll our own datastore/backend within WPS. This is easily feasible for small osm data, but not so much for large osm data. This would probably work with GeoTools as this is what WPS primarily uses for conversion to different formats. This might be slow, however.
If we go with the last option, this would probably involve rolling out our own class extending the AbstractFileDataStore class in GeoTools or that of IData in WPS. The specifics of how this would work would be (depending on what sort of defaults we'd want) store general features, and create some sort of hierarchy from general to specific for particular regions? This is dependent on how large particular regions are in terms of data available, and would require a significant preprocessing stage. Currently it is unknown if the planet.osm data is sorted. frown, sad smile

It also turns out that the osmosis tool is written in Java, and has some useful benchmarks, so it might be worth looking at some of the classes within osmosis on github to see how fast processing of osm data is done. Despite the large amount of data, it appears that it is possible to process them rather fast without a significant memory footprint.

It appears more and more that this will have to be a fork of trunk at some point possibly. I can think of one way to organize the different components of osmtransform in the project:
  • the preprocessing part of osmtransform to be part of the IO module, namely in the parsers package. Depending on the user needs, it may be worth to set a separate backend/process to handle this. Basically:
    • Two possible parsers: one for file-based OSM data, one for API-based OSM data.
      • The file-based OSM data will need some preprocessing based on the size of the OSM data. This is converted into some kind of intermediary format, with some preprocessing to make requests faster. Probably some hierarchal sorting (i.e. by blocks, or by region) will be useful.
      • The API-based OSM data probably will need minimal preprocessing, and instead serve more as a passthrough interface to the OSM process
    • Two possible processes/AlgorithmRepositories: one for file-based OSM data, one for API-based OSM data
      • The file-based OSM data will require an AlgorithmRepository that can handle the internal converted format. The capabilities which it supports is... ???? There will exist options for which mirror to pull the OSM data from, and how often it updates
      • The API-based OSM data will require an AlgorithmRepository that handles the OSM data directly as queried by the OSM provider. There will exist options for the source of the API-based OSM data, and whether it has write access (and what API format it uses)
  • the ui part of osmtransform would likely be a separate web page that performs some basic operations on OSM data, that are forwarded to the WPS itself.
Diagram of workflow would look like this:



Some details of the ui may not quite be as clearcut as it would initially seem. As mentioned before, there was discussion as to whether the UI component would be Java Swing-based or web-based. If the former option, it would act like one of the existing clients that interact with the WPS. If the latter option, then would probably be some additional webpage for the WPS? Assuming we go with the latter option, it is quite possible that we can have certain kinds of requests be served over a web interface, e.g. integrate OpenLayers into the WPS. This might facilitate easy testing, and allow for uploads of smaller osm data for conversion.


Start writing the Parser classes for planet.osm data and OSM3S data. Look at the osmosis code and get hints for possible speedups. Document all the facilities that OSM3S provides, and start composing the AlgorithmRepository subclass that provides the capabilities of OSM3S servers.

Weekly Report 0 (May 21st)


We had a phone call with Benjamin on Friday, to discuss where I was regarding to schedule, and to discuss the preliminaries of UI mockup. We are slightly behind schedule in that the UI mockup should probably have been completed last week, so hopefully by the middle of this week this should be sorted out. The question posed to Benjamin was whether or not this would be Java Swing-based or whether would be web-based, the consensus being that since this was an addition to WPS, it would probably have to be web-based.


So one problem that prevented work from getting done this weekend was sporadic power outages in the Waterloo region. There were some work lost as various parts of campus and residences either cycled power at various points of the day, or lost power for several hours, which was rather unfortunate. This was extremely unusual, and also highly frustrating. It is also a holiday today (Victoria Day).


As per the schedule, the TODO for this week would be to flesh out the preliminaries of that UI mockup (and get feedback from Benjamin and/or other parties). This will probably be put on the wiki somewhere. Also the other TODO for this week is to start committing code to the repository, basically as a skeleton structure for the different components of the project.


  • Topic created by: SarahHarvey
  • Topic created on: 2012-05-21

Topic attachments
I Attachment Action Size Date Who Comment
gthelper.diffdiff gthelper.diff manage 1 K 14 Aug 2012 - 01:45 UnknownUser diff for GTHelper.java
This topic: Projects > GSoC > GSoC2012Projects > GSoC2012ProjectsOnDemandTransformationOfOSMData > GSoC2012ProjectsOnDemandTransformationOfOSMDataReport
Topic revision: 22 Aug 2012, SarahHarvey
Legal Notice | Privacy Statement

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback