DataCove Ghosts - Data Provision

  • Author
    Katharina SCHLEIDT
  • up
    0 users have voted.

    At DataCove, we have a great deal of experience with all aspects of the environmental data provision process, are thus fully aware of the possibilities provided through modern technologies, and in turn duly frustrated with the slow implementation process. Inquiries as to why existing data cannot be made available faster received the response that nobody is using it anyway, so why even bother putting it online at all.

    In order to overcome this chicken-egg problem, with data not being made available due to a lack of potential users and tools, while the users and tools cannot emerge until data becomes available in a standardized and reusable format, we have developed the concept of Data Ghosting. A great deal of data is actually already available online in some form; the catch is the form, either large static blobs (shapefiles, databases) or non-harmonized services (OpenGovernmentData), neither easy to use for new applications accessing data from multiple sources. In our Data Ghosting process, we access this available data, transform it to correspond to the harmonized data models, and then make it available via standardized web services.

    At present, we are focusing on the Central European Region; ghosting data pertaining to various INSPIRE Data Themes from various national, European and international sources. With these harmonized services in place, we are free to start work on tools that utilize these standardized data sources. These range from simple visualization allowing the user to explore the available data to mobile apps allowing not only for data discovery and visualization, but also giving the user the opportunity to extend the data holdings, adding additional information that is persisted together with the original data.

    For the DataCove Statistical Viewer, we provide data pertaining to the INSPIRE Themes Statistical Units and Population Distribution. Relevant data was identified at Eurostat:

    All data is stored in a PostGIS database, GeoServer is being utilized for the provision of download services. Configuration was performed manually as this gives us the most control over the configuration, while making it easier to identify bugs in the underlying systems. First database tables were set up in accordance with the requirements of the INSPIRE Data Specifications, then we created a GeoServer App Schema configuration for data provision (using that alltime favorite tool NotePad++). The Extract, Transform and Load process was performed using our own Java utilities as these give us the greatest flexibility pertaining to input formats.

    The following difficulties were encountered in the process of setting up these services:

    • Namespace specific endpoints: In the GeoServer version used for these services (2.10), namespace specific endpoints are not possible as the encoding of the resultant XML does not well encode the element namespaces. The recent fix has yet to be tested.
      Workaround: access the namespace agnostic endpoint for all feature types
    • WMS doesn't work on geometry type gml:MultiSurface
      Workaround: set up simple features and create a WMS for these
    • PostGIS connections tend to hang: when you restart GeoServer, the old DB connections are retained; if you do this a few times, all DB connections are taken, data retrieval becomes erratic
      Workaround: don't restart GeoServer too often (now possible to reload App Schema from the web GUI)