Course Batch Starts, Timing, Price & Enroll

Program Duration Batch Starts Time Price # Enroll Book free demo
32 Hrs Weekend Morning-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekend Evening-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekdays Morning-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekdays Evening-Batch USD 400
INR 20000
Enroll Now Book free demo class

# Cloud lab charges will be extra. Our technical consultant will share actual lab charges with you.

About Course

The targeted audience for this course can be -
Software Engineers
ETL Developers
Data Scientists
Analytics Professionals
Professional looking a career in Big Data
To become an expert in Big Data Hadoop Ecosystem you are required to have in-depth understanding of Spark applications using Scala programming. This course is designed to help you in understanding the core concept of Apache Spark such as Spark Streaming, RDD, Spark SQL, DataFrames, Datasets, Spark MLlib, Spark GraphX and Spark Shell. Under this course you will learn how to customize Spark application using Scala programming.
After completing this course you will be able to –

Understand the core concept of Apache Spark
Use Scala to write programs
Work with Spark on a cluster
Understand different features of Spark like Spark Streaming, RDD, SparkSQL
Programming with Spark MLlib and Spark GraphX
As such there is no formal prerequisite to join this course but having a fundamental knowledge about any programming language, database, SQL queries and basics of Linux will help to cover this course in quick way.


Apache Spark and Scala

  • 1.1 Introduction to Scala
  • 1.2 Install and configure Scala
  • 1.3 First program using Scala
  • 1.4 Different operators in Scala
  • 1.5 Functions and Loops
  • 1.6 Array, Map, Lists, Tuples
  • 1.7 Collection
  • 1.8 OOPs concept and their use
  • 1.9 Traits as Interfaces
  • 2.1 Interactive Analysis with the Spark Shell
  • 2.2 RDD Operations
  • 2.3 Caching
  • 3.1 Linking with Spark
  • 3.2 Initializing Spark
  • 3.3 Resilient Distributed Datasets (RDDs)
  • 3.4 Parallelized Collections
  • 3.5 External Datasets
  • 3.6 RDD Operations
  • 3.7 Working with Key-Value Pairs
  • 3.8 Transformations
  • 3.9 Actions
  • 3.10 Shuffle operations
  • 3.11 RDD Persistence
  • 3.12 Shared Variables
  • 3.13 Deploying to a Cluster
  • 3.14 Unit Testing
  • 4.1 Linking
  • 4.2 Initializing StreamingContext
  • 4.3 Discretized Streams (DStreams)
  • 4.4 Input DStreams and Receivers
  • 4.5 Transformations on DStreams
  • 4.6 Output Operations on DStreams
  • 4.7 Accumulators and Broadcast Variables
  • 4.8 DataFrame and SQL Operations
  • 4.9 MLlib Operations
  • 4.10 Caching / Persistence
  • 4.11 Checkpointing
  • 4.12 Deploying Applications
  • 4.13 Monitoring Applications
  • 4.14 Reducing the Batch Processing Times
  • 4.15 Setting the Right Batch Interval
  • 4.16 Memory Tuning
  • 4.17 Fault-tolerance Semantics
  • 5.1 SQL
  • 5.2 Datasets and DataFrames
  • 5.3 Starting Point: SparkSession
  • 5.4 Creating DataFrames
  • 5.5 Running SQL Queries Programmatically
  • 5.6 Creating Datasets
  • 5.7 Data Sources
  • 5.8 Generic Load/Save Functions
  • 5.9 Parquet Files
  • 5.10 JSON Datasets
  • 5.11 Hive Tables
  • 5.12 JDBC To Other Databases
  • 5.13 Troubleshooting
  • 5.14 Performance Tuning
  • 5.15 Distributed SQL Engine
  • 6.1 Data types
  • 6.2 Basic statistics
  • 6.3 Classification and regression
  • 6.4 Collaborative filtering
  • 6.5 Clustering
  • 6.6 Dimensionality reduction
  • 6.7 Feature extraction and transformation
  • 6.8 Frequent pattern mining
  • 6.9 Evaluation metrics
  • 6.10 PMML model export
  • 7.1 The Property Graph
  • 7.2 Graph Operators
  • 7.3 Pregel API
  • 7.4 Graph Builders
  • 7.5 Vertex and Edge RDDs
  • 7.6 Optimized Representation
  • 7.7 Graph Algorithms - PageRank, Connected Components and Triangle Counting

Exam & Certification

Cloudera is offering a certification exam named as “CCA Spark and Hadoop Developer” to demonstrate the individual’s knowledge in Spark and BigData terminology.
Exam Name: CCA Spark and Hadoop Developer
Exam Code: CCA175
Number of Questions: 10–12 performance-based tasks on CDH5 cluster.
Time Limit:120 minutes
Passing Score: 70%
Language: English, Japanese

Select Trainer for Demo

Archana Jaiswal
Certification: Cloudera Certified Developer - Hadoop , Hortonworks Certified Developer (HDPCD)
Professional Experience
Training Experience


Big Data, Cassandra, Hadoop , MongoDB, Apache Spark, Hortonworks Certified Developer (HDPCD), Cloudera Certified Developer - Hadoop,

Archana Jaiswal is a Freelance Corporate Trainer, Blogger, and Consultant with International Experience. .She has more than 17 years of experience in the field of IT related trainings and executive coaching which helps approach training with seriousness and diligenceArchana currently conducts Hadoop Developer, Spark, HBaase, MongoDB, Cassandra training programs and executive coaching for various large organizations. She is a Cloudera Certified Trainer for Hadoop Developer and Hadoop Spark as well as Hortonworks Certified Trainer for Data Analyst and has successfully completed Cloudera Hadoop Developer certification (CCDH) and Hortonworks Developer Certification. Archana did her Master’s in Computer Application from Sikkim Manipal University and is a Graduate from University of Delhi in Human Psychology. Here are some of the Organizations where she had conducted trainings. TCS , Delhi, OFSS, Oracle Pune, Symphony, Bangalore, Kale Consultancy, Mumbai, Schneider, Pune, Cognizant (CTS) , Pune, Chennai, ATOS, Pune, Accenture, KPIT, Pune, Geometric, Pune, eValueServe, Gurgaon, SDLC, Bangalore, Exilant, Bhuvneshwar, MPhasis, Mangalore, L&T Infotech, Chennai, Food Corporation of India, New Delhi, DRDO, New Delhi Read More...
Narendra Tripathi
Good knowledge of Hadoop
Jason Mendes
The course was conducted in a thorough manner, in spite of having a sparse background in coding, she was able to explain and clarify with relative ease all doubts and queries.
Dipti Parkar
Very nice
Tushar Maruti Salunkhe
It was nice training
Swapnil Shinde
Very good
Diya Sengupta
Trainer was very good with n extensive knowledge about Hadoop/Spark. Conepts and logcs were explained in a good manner
vighneshwar mishra
Excellent trainer and quality training.
manojit basak
We were definitely fortunate to have her as our trainer
Rahul Namdeo
Time was a constraint. Should have more time to cover all the topics in-depth. Trainer was knowledgeable.
Shiva Reddy
Certification: Big Data , IBM Big Data & Analytics , IBM Spark
Professional Experience
Training Experience


Hadoop , Qlik Sense, QlikView, Talend Open Studio, Apache Spark,

He is having 6+ years of experience.
Certification: IBM DataScience Foundations , SCALA
Professional Experience
Training Experience

Master of Computer Applications

Apache Sqoop, Big Data, Hadoop , Hibernate, Java , Java EE, SOAP, Spring AOP, MVC, Apache Hadoop MapReduce, Apache Hadoop YARN, Apache Hive, Apache Pig, Apache Spark, Java EE Web Services,

Hadoop / Java - Continuous Learner ! Passionate about sharing the knowledge ! Read More...

** The above course information is taken from The Apache Software Foundation

* Money Back Guarantee till demo and 1st class of the course.

* All trademarks and logos appearing on this website are the property of their respective owners.

Copyright ©2015, All Rights Reserved. Hub4Tech™ is registered trademark of Hub4tech Portal Services Pvt. Ltd.
All trademarks and logos appearing on this website are the property of their respective owners.