Data-driven is the modern mantra of business management, but enabling a data-driven organization is complex and challenging. Abundant data sources and multiple use cases result in many data pipelines—maybe as many as one for each use case. Capabilities to find the right data, manage data flow and workflow, and deliver the right data in the right forms for analysis are essential for all organizations that seek to become data-driven.
Multiple and complex data pipelines can quickly become chaotic under pressure from agile development, democratization, self-service, and organizational pockets of analytics. The resulting difficulty in governance and uncertainty of data usage are only the beginning of the troubles. Therefore, data pipeline management must ensure that data analysis results are traceable, reproducible, and of production strength, whether enterprise-level or self-service. Robust pipeline management works across a variety of platforms from relational to Hadoop, and recognizes today’s bidirectional data flows where any data store may function in both source and target roles.