geonetwork and spatial metadata cataloguing

geonetwork spatial metadataParochialism is a common phenomenon among specialists. Ask any specialist why they’re interested in their area and they will no doubt answer with a sweeping string of superlatives and generalisations as to why their chosen specialty is absolutely fundamental to our understanding of humanity.

The geospatial industry is guilty of this too—within the world of data we push the value of location to the very forefront of our thinking.

One case in which the location of your data is very useful is in cataloguing and discovery. It is rare that a dataset has no location component—even if the data itself is aspatial, it is usually bounded with geographic classifiers, e.g. the country in which it was collected. For a data custodian, this information is incredibly relevant for taxonomy and classification. All of a sudden, we have a new axis upon which we can classify and index our data: not just what it is, but where it is. Similarly, people searching for a dataset can narrow their search based on the location of the data. This can be an extremely efficient way of searching as it allows you to quickly determine what does and doesn’t fall within your area-of-interest.

So that’s the pitch: a metadata catalogue that allows you to categorise and manage your datasets based on their spatial properties.

There are a handful of open source metadata platforms, some that have at least some spatial features (ckan for example, which is used throughout the Australian government), but GeoNetwork opensource is an open source metadata catalogue project under the sponsorship of the OSGeo foundation that presents itself as a spatially focussed system. Over the last year or so I’ve had a quite a bit of exposure to the GeoNetwork software and its capabilities through our work, so this seems like a good time to reflect on it.

The project I have been working on is not yet public, so I will draw my content from the current build of the software running on my local machine. You can try setting it up yourself, but be warned that the environment can be fiddly to get just right!

Download           https://geonetwork-opensource.org/downloads.html

Homepage          https://geonetwork-opensource.org/

Repository          https://github.com/geonetwork/core-geonetwork

Interface features

The first interface feature is the ability to actually view and manage the “spatial metadata” of the record, the set of information fields that describe the spatial representation of your data. For example, the bounding box that contains the dataset, the spatial reference system of the dataset, or the accuracy of the coordinates. GeoNetwork lets you manage and create all of this, through its editing interface.

geonetwork spatial metadata

Stemming from this capability is the real time search results extent preview: you can see the footprints (if available) for all your current search results in the (by default) little map in the bottom right. Hovering over a record will even highlight its bounding box on the map (but sadly not vice versa).

geonetwork spatial metadata

This feature gives a nice preview, but I think from an interface perspective that the map is dwarfed into the corner and gets less use than it could. Live filtering by bounding box is also possible using this map by selecting and dragging with the pen tool, but the map body is so small this doesn’t provide much control for selection and is not a particularly good user experience.

geonetwork spatial metadata

geonetwork spatial metadata

The second is support for previewing spatial resources, particularly OGC services. If a metadata record has associated OGC resources (such as a WMS or WFS), you can load it onto the map. Clicking the “view” button on the record will load up the layer into the “map” view:

geonetwork spatial metadata

Confusingly, this “map” view is not the map from the search screen with the bounding box previews, but the map module from the top menu, and the two elements are not connected. By “not connected”, I mean that the only way to do spatial bounding box filtering and view footprints is in the tiny map, and the only way to preview OGC services is in the big map. You can “filter” the big map, but this is more about filtering the features that are coming in from the OGC service, not a spatial search.

Back-end features

Being open source, GeoNetwork is well placed for integration, as it allows you to federate your spatial metadata sources, which may be from other catalogues, or from different departments within an organisation. A whole slew of APIs are included in this toolkit (see here for a list of REST endpoints), meaning you can build integrations between GeoNetwork and your existing systems, or plan around new ones. Judging from the number of times that this feature is mentioned in GeoNetwork project text, “support for combining multiple catalogues” appears to be perceived as one of the stronger use-cases for GeoNetwork. I do not think this will be useful to all organisations, but groups with multiple departments or stakeholders (such as government data portals with data from multiple jurisdictions or stakeholders, for example) may certainly prioritise this capability.

Of course, GeoNetwork does all the other core metadata functions very well too. It’s built on the Lucene search engine for queries, with a highly configurable index to customise your searches. It has an extensive directory management system for handling roles and contacts, record importing and exporting, multilingual support, and a hands-on but powerful reporting capability, via the API and XSL templates, among other things.

One final feature is an open and extensible schema plugin infrastructure, which allows you to use and combine whatever metadata standards you wish, and even create your own (if you have a masochistic streak). This is once again useful for organisations with unique needs for their metadata. For example, our client who is implementing GeoNetwork captures aerial imagery, and they have added their own metadata fields around the specs for the physical sensor that is used to collect an aerial run.

So who should use it?

GeoNetwork open source is a solid piece of software, built with a huge array of open source components. It can sit inside your data environment and act as a connector between disparate sources and data types. You can preview your spatial datasets from within the catalogue and even perform simple spatial discovery.

It also exists very much in the world of standards and interoperability. It is built with the classic international metadata schemas as its backbone, and operates under the assumption that these standards should underpin your data catalogue. To be honest, I was surprised at first that there were not more tools to allow you to discover datasets spatially. The first draft I wrote of this blog was disappointed in how few features were geared towards spatial filtering or browsing, which seemed to me like the primary point of a “spatial” metadata catalogue.

As I looked more into it, I realised that the spatial aspects of GeoNetwork are actually influenced more by this “world of standards”. Of course, spatial metadata has its own set of special properties that are not catered for by other metadata protocols, and GeoNetwork gives you the ability to view and manage that in that structured and programmatically accessible way.

To me, it feels like the GeoNetwork team has emphasised the management and maintenance aspects of spatial metadata. The spatial data custodian is the one who’s having their job enriched and streamlined, and while there are a few other bits and pieces that help (like previewing OGC services), I feel that this is the core audience for GeoNetwork.

In short, GeoNetwork is good for people or organisations with:

  • a business need to store metadata in an internationally standardised way,
  • lots of spatial datasets,
  • specifically, spatial datasets to which spatial metadata properties are very important,
  • or, spatial datasets that require custom fields,
  • a requirement for federation of catalogues or inclusion of external metadata sources (i.e. combining multiple sources of metadata),
  • a metadata portal that can integrate with other systems (either internal or external), or
  • a requirement for previewing multiple spatial datasets

If any of the above points apply to you, I’d suggest you take a look at GeoNetwork for your organisation.

If not, GeoNetwork might not be for you. And don’t be fooled by its name—it’s not just for spatial datasets, you need to have a lot of spatial datasets, preferably in different places, for it to really be worth it. Don’t build an aeroplane hangar when you only need a garage!

As a final suggestion, if you are in the market for a less formalised, more kind of “social” metadata system, have a look at GeoNode.

Homepage          http://geonode.org/

Example               http://aware.cirad.fr/

It is a sister project to GeoNetwork that calls itself a “Geospatial Content Management System”. In web jargon this means it should be more like a blog than a catalogue, designed to be a data publishing platform for “non-specialised users”. I have not used GeoNode enough to do a fair comparison at this point, but I feel that it is worth mentioning!

* It should be mentioned that there are a whole collection of pros and cons—depending on your use case—that stem from the fact that GeoNetwork is open source, that have been answered many times elsewhere.

Tom Hollands
Latest posts by Tom Hollands (see all)