<
Apache Spark and Scala Certification Training in Pune
Flexible Schedule
32 Hrs Project Work & Exercises
24 Hrs Instructor Led online Training
22 Hrs Self-paced Videos
24X7 Support
Certification and Job Assistance
Enquire Now
Overview
- Basic concepts of Spark and Scala
- Difference between Spark and Hadoop
- Writing applications on Spark using Java, Python and Scala
- Spark RDDs and Dataframes
- Scala-Java Interoperability
- Developing codes in Scala
- Data aggregation and performance improvement in Spark
- Software Engineers looking to upgrade Big Data skills
- Data Engineers and ETL Developers
- Data Scientists and Analytics Professionals
- Graduates looking to make a career in Big Data
What you will gain?
- 1 to 1 Live Online Training
- Dedicated 24 x 7 Support
- Flexible Class Timing
- Training Completion Certification
- Direct Access to the Trainer
- Lifetime Access of an LMS
- Real-time Projects
- Dedicated Placement Support
Fees
Self Paced Training
9,405
Online Live One to One Training
15,048
Course Content
- 1.1 Introducing Scala
- 1.2 Deployment of Scala for Big Data applications and Apache Spark analytics
- 1.3 Scala REPL, lazy values, and control structures in Scala
- 1.4 Directed Acyclic Graph (DAG)
- 1.5 First Spark application using SBT/Eclipse
- 1.6 Spark Web UI
- 1.7 Spark in the Hadoop ecosystem
- 2.1 The importance of Scala
- 2.2 The concept of REPL (Read Evaluate Print Loop)
- 2.3 Deep dive into Scala pattern matching
- 2.4 Type interface, higher-order function, currying, traits, application space and Scala for data analysis
- 3.1 Learning about the Scala Interpreter
- 3.2 Static object timer in Scala and testing string equality in Scala
- 3.3 Implicit classes in Scala
- 3.4 The concept of currying in Scala
- 3.5 Various classes in Scala
- 4.1 Learning about the Classes concept
- 4.2 Understanding the constructor overloading
- 4.3 Various abstract classes
- 4.4 The hierarchy types in Scala
- 4.5 The concept of object equality
- 4.6 The val and var methods in Scala
- 5.1 Understanding sealed traits, wild, constructor, tuple, variable pattern, and constant pattern
- 6.1 Understanding traits in Scala
- 6.2 The advantages of traits
- 6.3 Linearization of traits
- 6.4 The Java equivalent
- 6.5 Avoiding of boilerplate code
- 7.1 Implementation of traits in Scala and Java
- 7.2 Handling of multiple traits extending
- 8.1 Introduction to Scala collections
- 8.2 Classification of collections
- 8.3 The difference between iterator and iterable in Scala
- 8.4 Example of list sequence in Scala
- 9.1 The two types of collections in Scala
- 9.2 Mutable and immutable collections
- 9.3 Understanding lists and arrays in Scala
- 9.4 The list buffer and array buffer
- 9.6 Queue in Scala
- 9.7 Double-ended queue Deque, Stacks, Sets, Maps, and Tuples in Scala
- 10.1 Introduction to Scala packages and imports
- 10.2 The selective imports
- 10.3 The Scala test classes
- 10.4 Introduction to JUnit test class
- 10.5 JUnit interface via JUnit 3 suite for Scala test
- 10.6 Packaging of Scala applications in the directory structure
- 10.7 Examples of Spark Split and Spark Scala
- 11.1 Introduction to Spark
- 11.2 Spark overcomes the drawbacks of working on MapReduce
- 11.3 Understanding in-memory MapReduce
- 11.4 Interactive operations on MapReduce
- 11.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
- 11.6 The overview of Spark and how it is better than Hadoop
- 11.7 Deploying Spark without Hadoop
- 11.8 Spark history server and Cloudera distribution
- 12.1 Spark installation guide
- 12.2 Spark configuration
- 12.3 Memory management
- 12.4 Executor memory vs. driver memory
- 12.5 Working with Spark Shell
- 12.6 The concept of resilient distributed datasets (RDD)
- 12.7 Learning to do functional programming in Spark
- 12.8 The architecture of Spark
- 13.1 Spark RDD
- 13.2 Creating RDDs
- 13.3 RDD partitioning
- 13.4 Operations and transformation in RDD
- 13.5 Deep dive into Spark RDDs
- 13.6 The RDD general operations
- 13.7 Read-only partitioned collection of records
- 13.8 Using the concept of RDD for faster and efficient data processing
- 13.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions
- 14.1 Understanding the concept of key-value pair in RDDs
- 14.2 Learning how Spark makes MapReduce operations faster
- 14.3 Various operations of RDD
- 14.4 MapReduce interactive operations
- 14.5 Fine and coarse-grained update
- 14.6 Spark stack
- 15.1 Comparing the Spark applications with Spark Shell
- 15.2 Creating a Spark application using Scala or Java
- 15.3 Deploying a Spark application
- 15.4 Scala built application
- 15.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list
- 15.6 Creating an application using SBT
- 15.7 Deploying an application using Maven
- 15.8 The web user interface of Spark application
- 15.9 A real-world example of Spark
- 15.10 Configuring of Spark
- 16.1 Learning about Spark parallel processing
- 16.2 Deploying on a cluster
- 16.3 Introduction to Spark partitions
- 16.4 File-based partitioning of RDDs
- 16.5 Understanding of HDFS and data locality
- 16.6 Mastering the technique of parallel operations
- 16.7 Comparing repartition and coalesce
- 16.8 RDD actions
- 17.1 The execution flow in Spark
- 17.2 Understanding the RDD persistence overview
- 17.3 Spark execution flow, and Spark terminology
- 17.4 Distribution shared memory vs. RDD
- 17.5 RDD limitations
- 17.6 Spark shell arguments
- 17.7 Distributed persistence
- 17.8 RDD lineage
- 17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey
- 18.1 Introduction to Machine Learning
- 18.2 Types of Machine Learning
- 18.3 Introduction to MLlib
- 18.4 Various ML algorithms supported by MLlib
- 18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques
Hands-on Exercise:
- 1. Building a Recommendation Engine
- 19.1 Why Kafka and what is Kafka?
- 19.2 Kafka architecture
- 19.3 Kafka workflow
- 19.4 Configuring Kafka cluster
- 19.5 Operations
- 19.6 Kafka monitoring tools
- 19.7 Integrating Apache Flume and Apache Kafka
Hands-on Exercise:
- Configuring Single Node Single Broker Cluster
- Configuring Single Node Multi Broker Cluster
- Producing and consuming messages
- Integrating Apache Flume and Apache Kafka
- 20.1 Introduction to Spark Streaming
- 20.2 Features of Spark Streaming
- 20.3 Spark Streaming workflow
- 20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
- 20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
- 20.6 Important windowed operators and stateful operators
Hands-on Exercise:
- Twitter Sentiment analysis
- Streaming using Netcat server
- Kafka–Spark streaming
- Spark–Flume streaming
- 21.1 Introduction to various variables in Spark like shared variables and broadcast variables
- 21.2 Learning about accumulators
- 21.3 The common performance issues
- 21.4 Troubleshooting the performance problems
- 22.1 Learning about Spark SQL
- 22.2 The context of SQL in Spark for providing structured data processing
- 22.3 JSON support in Spark SQL
- 22.4 Working with XML data
- 22.5 Parquet files
- 22.6 Creating Hive context
- 22.7 Writing data frame to Hive
- 22.8 Reading JDBC files
- 22.9 Understanding the data frames in Spark
- 22.10 Creating Data Frames
- 22.11 Manual inferring of schema
- 22.12 Working with CSV files
- 22.13 Reading JDBC tables
- 22.14 Data frame to JDBC
- 22.15 User-defined functions in Spark SQL
- 22.16 Shared variables and accumulators
- 22.17 Learning to query and transform data in data frames
- 22.18 Data frame provides the benefit of both Spark RDD and Spark SQL
- 22.19 Deploying Hive on Spark as the execution engine
- 23.1 Learning about the scheduling and partitioning in Spark
- 23.2 Hash partition
- 23.3 Range partition
- 23.4 Scheduling within and around applications
- 23.5 Static partitioning, dynamic sharing, and fair scheduling
- 23.6 Map partition with index, the Zip, and GroupByKey
- 23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions
Benefits of Online Training
- 100% Satisfaction Ratio
- Dedicated Help In Global Examination
- Updated Syllabus & On-Demand Doubt Session
- Special Group & Corporate Discounts
FAQ’s
ZebLearn offers lifetime access to videos, course materials, 24/7 support and course material upgrades to the latest version at no extra fee. For Hadoop and Spark training, you get the ZebLearn Proprietary Virtual Machine for lifetime and free cloud access for 6 months for performing training exercises. Hence, it is clearly a one-time investment.
You would be glad to know that you can contact ZebLearn support even after the completion of the training. We also do not put a limit on the number of tickets you can raise for query resolution and doubt clearance.
You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.
Job Opportunities
Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools.
Apache Spark alone is a very powerful tool. It is in high demand in the job market. If integrated with other tools of Big Data, it makes a strong portfolio. Today, Big Data market is booming and many individuals are making use of it.
Placement Partner