Most companies are undergoing a transformation to being “A data company that does [insert their business] better than anyone else”. Modern companies are not only data, digital, and cloud native, but they find ways to differentiate through their data and are monetizing it as an additional revenue stream. Further, the only way to stay at pace with the rapid evolutions in AI and machine learning (ML) will require strategic investments in stabilizing the underlying data infrastructure. But what happens when the immense amount of data held today isn’t properly managed?
Imagine trying to find a specific book in a library, without knowing its location, title or even who the author is. Oh, and there is no tool or person to ask, so you go around asking anyone else in the library for help hoping they point you in the right direction or just give you a book. Similarly, unmanaged data buries itself in a dark corner of a ‘library’, but in most cases it no longer resembles the book it once was, and the author is unknown. This often happens through data silos, redundant or duplicate platform services, conflicting data stores and definitions, and more, all driving up unnecessary costs and complexity.
While the ideal scenario would be to ensure that all data assets are discoverable in the first place, there are ways of untangling the mess once it’s happened. But this is something every enterprise struggles with. Individual teams often have their own access to infrastructure services and not all data events - including sharing, copying, exporting, and enriching - in those platforms are monitored at the enterprise level. Consequently, the challenge persists, expands, and the library of data continues to grow without consistent governance or control.
Read the full article, via TechRadar.