Don’t Bite Off Too Much At Once – “Crawl, Walk & Run”

Key Enablers to a successful Metadata Product development

Posted by Madhuri Gururaj on September 18, 2017

In the data driven world that we live in today, it is common to find ourselves spending hours to find information on our own data –from knowing what data we have (data catalog), its definition (both business & technical), its flow and footprint (source of truth), how much data we have (volumes and metrics), and so on. Knowledge workers spend 15-35% of their time searching for information1. To get a good 360-degree view of our data assets, it is critical that we have a systematic approach to collecting and managing information about our data (Metadata). Companies are investing in Metadata and practices to reuse data, have visibility to data lineage, compliance and extend the life of their data assets.

As Monsanto transforms its business across the globe into becoming more digital, it has become fundamental for our customers, business partners and technical resources to be able to efficiently access well organized and classified information so they spend less time on preparing and managing datasets in files and more time on drawing deeper insights and driving informed business decisions. A core group of data gurus at Monsanto representing R&D, Global Supply Chain and Commercial deliberated for months to identify our top 10 enterprise Metadata capabilities. After evaluating and scoring many off-the-shelf products against these ranked capabilities, the core team found that none of these products both aligned with these capabilities and had a reasonable price tag.

The decision was made to build a newer version of “FAKS” (an existing internal and US only metadata solution) using MediaWiki which is the Open Source engine that drives Wikipedia. It was quickly rolled out with some basic set of features and templates to gather and manage business and technical definitions about our datasets. From here on the focus was championing the product while rolling out new capabilities that added value.

The key enablers that endorsed and helped bring visibility to the platform included many road shows to listening to our customer for new features and POCs to enable those capabilities.

Leverage New Technologies

Utilizing Open Source toolsets, vendor-provided APIs and AI accelerated the development process by letting us concentrate on Metadata-specific functionality instead of having to re-invent the wheel for capabilities like usage metrics, querying capabilities, content management that allowed multiple users to contribute (wikis), etc.

Branding Helps Unleash Your Identity

By re-using the look and feel of Wikipedia, we began receiving feedback that the UI felt outdated and was not user friendly (largely due to Wikipedia not having a major UI change in nearly 20 years). Partnering with the UX (user experience) team to help redesign the UI and brainstorm on branding strategies was one of the most impactful decisions we made. The redesign completely changed the end user experience, and branding the product as “Haystack” also gave it a new identity and engendered growing excitement from the end user base.


Your Customer is No.1

Today’s users are constantly interacting in an eco-system of diverse set of tools and windows. Enabling cross-platform integration allows us to leverage best features from different application into one common user experience. Haystack has an AI Slack bot, we call him “Alfred”, that queries the MediaWiki API and retrieves the data from the page and responds with the content in a Slack chat session. This can be utilized in both a Slack public channel as well as a direct message to the bot. The interest in this feature has escalated where several teams have approached asking to expand our AI bot to include functionality from other platforms. Similarly, another capability allows searches across Haystack and CKAN data catalog in one request. Similar to Amazon’s Alexa or Apple’s Siri, we are not far from being able to enhance Haystack’s Alfred to be an intelligent personal assistant to answer questions like “Alfred…tell me what is the source of truth for Corn’s Relative Maturity?”

Slack: #haystack


Its All About The Numbers!

Usage and adoption metrics are collected about the new platform and made available via Piwik (a self-service Open Source UI), that allows the user to view and customize Dashboards. By calling the Piwik API’s, Haystack’s AI bot is calling them and pushing the results to the #Haystack Slack Channel.


Agile Mindset

To keep the users excited and engaged with Haystack, we constantly collaborated with users, data stewards, engineers and leads to review and solicit feedback on a prioritized list of key capabilities and gaps. This enabled us to effectively and continuously deliver small features of high value some of which were not only challenging but required creativity.

Keep An Evergreen Product Roadmap

In phase 1, our goal was to lay the foundation down for the Metadata platform and acquire a 360-degree view of Monsanto’s metadata. We focused on Business Metadata (Glossaries) and Technical Metadata (Relational Data Catalog) and how to tie them together. Although this combination is already very powerful, we are still missing a large piece of the puzzle: – Data Lineage. As we continue to work on enhancements to the platform capabilities in phase 2 including work on Data Lineage, we will conduct workshops to roll out a structured process of content management and formalize Haystack governance.

##Special thanks to John Cooper for his help with the technical content in this post.

posted on September 18, 2017 by
Madhuri Gururaj