Three incremental, manageable steps to building a “data first” data lake
Applications have always dictated the data. That has made sense historically, and to some extent, continues to be the case. But an “applications first” approach creates data silos that are causing operational problems and preventing organizations from getting the full value from their business intelligence initiatives.
Azure SQL Data Warehouse: Introduction
Azure SQL Data Warehouse is a fully-managed and scalable cloud service.
The Informed Data Lake: Beyond Metadata
Historically, the volume and extent of data that an enterprise could store, assemble, analyze and act upon exceeded the capacity of their computing resources and was too expensive. The solution was to model some extract of a portion of the available data into a data model or schema, presupposing what was “important,” and then fit the incoming data into that structure.
Real Time Streaming with Spring xd, Apache Geode (Gemfire), and Greenplum
Spring xd is a unified, distributed, and extensible service for data ingestion, real time analytics, batch processing, and data export.
Data Orchestration using Hortonworks DataFlow (HDF)
Hortonworks Dataflow (HDF), powered by Apache NiFi, is the first integrated platform that solves the real time complexity and challenges of collecting and transporting data from a multitude of sources be they big or small, fast or slow, always connected or intermittently available