If you’re a data consumer, saying the word “metadata” conjures images of pages and pages of CKAN search results (data.gov.au anyone?), field after field of inscrutable information, and you’re just trying to find that download link or an email contact buried somewhere in the mess of information. Maybe you’ve just started a new job and you need access to some corporate dataset, and you’ve heard through word-of-mouth that Bob Scientist has the data you’re looking for, but oh no! Bob Scientist is on leave for a month – guess you can’t do your work.
If you’re a data creator on the other hand, the images are of massive complex XML documents, constant validation and verification, and painfully specific standards compliance. Like many who work with data, I have been both these people.
Does my organisation need metadata?
So, if metadata is such a pain, how does your organisation deal with it? How is it stored and shared?
The by-the-book utopian enterprise consultant answer is emphatically YES, your organisation NEEDS metadata. Any upstanding data consultant worth their salt will tell you that. And the rationale is sound – having an accessible repository of good quality metadata is a great way to counteract the issues that organically appear over time in big organisations around discoverability, accessibility and understanding of business data. Without metadata and good governance, “pillars of knowledge” are created, such that knowledge of data becomes localised to one single staff member. When the data was collected, how accurate it is, what each field means, even where the data is physically stored, is a mystery to your organisation apart from your one data martyr. What happens when that staff member retires or finally books that one-way ticket to Europe they’ve been threatening to for years? The pillar crumbles, and your organisation is left with all that data and no idea where or even if it exists. At best, you’re left with a breadcrumb trail. At worst, the knowledge just disappears.
The problem remains though that creating and maintaining metadata is scientifically proven to be a complete and total drag. It is not fun. People do not like doing it and it is basically never budgeted-for in project plans. It comes as an afterthought, if at all. Even organisations with detailed data governance policy and mandated metadata procedures find themselves in these holes simply because people do not like doing metadata.
So, what is the answer? How do you make sure your organisation is doing the right thing by itself, storing information about its data in a useful and accessible way when people are (probably justifiably) so resistant to putting it in place?
A holistic approach to metadata
The best advice in this area, I believe, is to look at the problem more holistically. Forget about standards and compliance in the short-term and focus on what metadata actually DOES for your organisation: it shares knowledge. The point of putting these processes in place is to ensure that information is transferred between staff members, both existing and future.
If you think that implementing a company-wide ISO 19115 or ANZLIC compliant metadata system is not going to be accepted or practiced by your staff (and let’s face it, it’s probably not), try a softer approach. Begin with a knowledge base, like any one of the open-source wiki projects out there (check out XWiki for example). Or try a commercial product like Confluence, if you want tech support.
If you collect data for projects, start small by creating a page for each project. Make a list of datasets created for the project. Whenever a project is kicked-off, make it part of the inception process to make one of these wiki pages. Page creators will end up being your data custodians, and the information can be transferred with minimal hassle. Get people used to looking for data in your knowledge base. If a project comes up in the same business area, make sure you are leveraging your existing data resources by making information about those data source discoverable. Check out the OpenStreetMap public wiki for some inspiration – this page lists metadata for associated tile providers; or this one, which provides a table description of map feature types.
Having a tag and archive system in place makes retrieving metadata as easy as looking up a book in a library
Building from there, create unique pages for each business dataset. Tags are to the modern internet what meta keywords were to the early worldwide web and the days of the original search engines, they can be used to clearly provide the bare minimum of invaluable information for your data. Make tags that can easily be located in searches that are rich with information. Tag spatial data with the physical scope of the data. Include tags that describe the content of your data.
Collaborative interfaces like this are now very familiar to users and the teething process is gentler, which allows the cultural practice to grow more naturally. Gradual change may be difficult to measure and hard to stomach when you want speedy reform, but people don’t like being thrown in the deep end, and metadata is one of the deepest ends out there.
At the end of the day, even if you never transfer to a fully standards-compliant metadata procedure, by implementing some kind of knowledge management there is at least something in place that performs the holistic role of metadata. The key is to build the consideration for metadata into business-as-usual. Get your mid-level staff onside (think project managers, business area leaders) and get them to extoll the virtues of knowledge sharing.
Barely anybody does metadata well, let alone perfectly. If you don’t think your organisation can successfully go down the metadata rabbit hole, think laterally and try to address the problem with an alternative solution. Because (begrudgingly) metadata does matter, and if you do not have some way of transferring information about your data, your organisation could be burning serious staff hours and, as a result, money.