Building a Real-time Streaming Pipeline with Spark, Kafka, and Cassandra: A Comprehensive Guide
In this tutorial, we delve into the intricate world of real-time data processing with an in-depth exploration of Spark, Kafka, and Cassandra. Discover how to architect a robust streaming pipeline that seamlessly integrates these powerful technologies to ingest, process, and store data in real-time. Follow along as we navigate through the setup process, explore key configurations, and dive into coding examples to unleash the potential of real-time data analytics. By the end of this guide, you’ll have the knowledge and tools to architect your own scalable and resilient streaming applications.
Version
spark-3.5.1-bin-hadoop3
apache-cassandra-5.0
kafka-3.6.0
1. Start Kafka
Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
Kafka
bin/kafka-server-start.sh config/server.properties
Create topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic exampletopic
bin/kafka-topics.sh --list --zookeeper localhost:2181
Producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic exampletopic
Consumer
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic exampletopic --from-beginning
or
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --from-beginning
2. Start Spark
sbin/start-all.sh
3. Start Cassandra
bin/cassandra -f
create keyspace sparkdata with replication ={'class':'SimpleStrategy','replication_factor':1};
use sparkdata;
CREATE TABLE cust_data (fname text , lname text , url text,product text , cnt counter ,primary key (fname,lname,url,product));
select * from cust_data;
Spark Kafka Cassandra Streaming Code
Start the Spark Shell with below command
bin/spark-shell --packages "com.datastax.spark:spark-cassandra-connector_2.11:2.0.2","org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0"
Run this code in the spark shell
“Thank you for reading! If you enjoyed this article and want to stay updated on my latest insights and projects, feel free to connect with me on LinkedIn.”