LinkedIn today open-sourced WhereHows, a meta data-centric tool the company has long used internally to make it easier for its employees to discover data the company generates and to track the lineage of its datasets as they move around its various internal tools and services.
Now that almost every modern business creates massive amounts of data, simply managing how all this information flows across an organization becomes virtually impossible. Sure, you can store it in a data warehouse, but at the end of the day, you end up with a large number of datasets that are very similar, or different versions of an original dataset, or information that has been transformed so it can be used by different tools. The exact same data also often ends up in multiple systems, just with different names or maybe version numbers. In the end, how do you know which dataset you should work with when you are building a new product (or maybe just an executive report)?