Apache Spark Basics

Webinar - GFU Cyrus AG

The goal of the Apache Spark Basics course is to provide participants with a solid understanding of Apache Spark and its fundamental concepts. By the end of the course, participants should be able to understand the challenges of big data processing and the advantages of Spark. They will gain comprehension of Spark's architecture and its components, such as the driver, executor, and cluster manager. Participants will also learn how to work with Resilient Distributed Datasets (RDDs) and perform various transformations and actions on them. Additionally, they will acquire knowledge of Spark Streaming for real-time data processing and gain the ability to integrate Spark with other technologies like Flume, Kafka, and Cassandra. Through hands-on exercises using PySpark, participants will develop practical skills and gain the confidence to effectively utilize Apache Spark for big data processing and analytics tasks.

Termin	Ort	Preis^*
04.09.2025- 05.09.2025	online	1.630,30 €	jetzt buchen
11.12.2025- 12.12.2025	online	1.630,30 €	jetzt buchen
10.12.2026- 11.12.2026	online	1.630,30 €	jetzt buchen
10.12.2026- 11.12.2026	Köln	1.630,30 €	jetzt buchen

Alle Termine anzeigen

^*Alle Preise verstehen sich inkl. MwSt.

Detaillierte Informationen zum Seminar

Inhalte:

Introduction to Apache Spark with Python (PySpark)
- Overview of big data processing challenges
- Introduction to distributed computing and parallel processing
- Introduction to Spark's architecture and components (driver, executor, cluster manager)
- Comparison with traditional batch processing frameworks (Hadoop MapReduce)
- Setting up Spark with Python-Shell
Spark Fundamentals with PySpark
- Understanding Resilient Distributed Datasets (RDDs)
  - RDD characteristics (immutable, partitioned, resilient)
  - RDD operations: transformations (map, filter, flatMap, etc.) and actions (count, collect, reduce, etc.)
  - Lazy evaluation and lineage in Spark
- Hands-on exercises using PySpark
Spark Streaming
- Introduction to Spark Streaming
- Streaming data processing concepts
- DStream (Discretized Stream) operations in Spark Streaming
  - Windowed operations
  - Stateful processing using updateStateByKey()
- Handling data sources (Flume, Kafka) and sinks (HDFS, Cassandra) in Spark Streaming
- Hands-on exercises with Spark Streaming
Integration with Flume, Kafka, and Cassandra
- Introduction to Apache Flume and its integration with Spark
  - Overview of Flume's event-based data ingestion
  - Setting up Flume agents and Spark integration
- Integration of Apache Kafka with Spark Streaming
  - Overview of Kafka's distributed publish-subscribe messaging system
  - Configuring Kafka and Spark integration for real-time data processing
- Introduction to Apache Cassandra and its integration with Spark
  - Overview of Cassandra's distributed NoSQL database
  - Connecting Spark to Cassandra for data storage and retrieval

Dauer/zeitlicher Ablauf:

2 Tage

Zielgruppe:

Data Engineers: Data engineers responsible for processing and analyzing large datasets can benefit from learning Apache Spark to leverage its distributed computing capabilities.
Data Scientists: Data scientists looking to work with big data and perform advanced analytics can enhance their skills by gaining knowledge of Apache Spark and its machine learning library, MLlib.
Software Developers: Software developers interested in distributed computing and working with big data can expand their skill set by learning Apache Spark and PySpark.
Data Analysts: Data analysts who want to analyze and process large datasets efficiently can learn Apache Spark to improve their data processing workflows.
IT Professionals: IT professionals involved in managing big data infrastructure and processing can benefit from understanding Apache Spark's architecture and its integration with other technologies.

Seminarkennung:

R82332

Nach unten

Nach oben

Wir setzen Analyse-Cookies ein, um Ihre Zufriedenheit bei der Nutzung unserer Webseite zu verbessern. Diese Cookies werden nicht automatisiert gesetzt. Wenn Sie mit dem Einsatz dieser Cookies einverstanden sind, klicken Sie bitte auf Akzeptieren. Weitere Informationen finden Sie hier.

Akzeptieren Nicht akzeptieren

Ihr Name:

Firma (optional):

Ihre E-Mail-Adresse:

Telefonnummer (optional):

Ihre Nachricht:

Um Spam abzuwehren, geben Sie bitte die Buchstaben auf dem Bild in das Textfeld ein:

Bei der Verarbeitung Ihrer personenbezogenen Daten im Zusammenhang mit der Kontaktfunktion beachten wir die gesetzlichen Bestimmungen. Unsere ausführlichen Datenschutzinformationen finden Sie hier. Bei der Kontakt-Funktion erhobene Daten werden nur an den jeweiligen Anbieter weitergeleitet und sind nötig, damit der Anbieter auf Ihr Anliegen reagieren kann.

Apache Spark Basics

Detaillierte Informationen zum Seminar

Anbieterinformationen