Posts Tagged real-time big data

Big Data Ingestion: Flume, Kafka and NiFi

Preliminaries When building Big Data pipelines, we need to think on how to ingest the Volume, Variety and Velocity of data showing up at the gates of what would typically be a Hadoop ecosystem. Preliminary considerations such as scalability, reliability, adaptability, cost in terms of development time, etc. will all come into play when deciding […]

, , , , , , , , , , ,

Leave a comment

Streaming Big Data: Storm, Spark and Samza

There are a number of distributed computation systems that can process Big Data in real time or near-real time. This article will start with a short description of three Apache frameworks, and attempt to provide a quick, high-level overview of some of their similarities and differences. Apache Storm In Storm, you design a graph of real-time computation called a topology, […]

, , , , , , ,

6 Comments

Lambda Architecture for Big Data

An increasing number of systems are being built to handle the Volume, Velocity and Variety of Big Data, and hopefully help gain new insights and make better business decisions. Here, we will look at ways to deal with Big Data’s Volume and Velocity simultaneously, within a single architecture solution. Volume + Velocity Apache Hadoop provides both reliable storage (HDFS) and a […]

, , , , , , , , , , , ,

Leave a comment