During this interactive training on Zoom you will learn about the different ingredients of Big Data such as Hadoop, Spark, Pig, Hive & Sqoop.
Further you will have hands-on experience on different pillars of the Big Data Ecosystem starting from parallel processing frameworks like Map Reduce & Spark, Distributed Storage techniques like HDFS, Big Data Administration Ambari etc.
At the end the training you will have in-depth understanding & hands-on related to Big Data solutions like Cloudera & HortonWorks.
27th FEB’ 2021
7 Weeks (Sat & Sun)
11 AM – 3.30 PM
Urud / Hindi
Meet the trainer of this course who is Big Data Expert!
What is Big Data?
The Big Data Era
Big Data – Data Sources
4 V’s of Big Data
Conventional Data Warehouse Architecture
Modern Data Warehouse Architecture
What is Data Discovery?
Distributed Computing & its Advantage
Big Data Processing Frameworks (Hadoop, Apache Spark, NoSQL Databases)
What is Hadoop & its History?
Introduction to Apache Hadoop Stack (HDFS, MapReduce, Flume, Sqoop, Zookeeper, Ozie, HBase,
Introduction to Big data distributions (On-prem and cloud)
Components of Hadoop Cluster (Master Node, Data Node, Namenode, Job Tracker, Task Tracker)
Sandbox (virtual machine) Installation
Introduction to Hadoop Distributed File System (HDFS)
How HDFS Works
HDFS Block Size & Replication Factor
HDFS Read & Write pipeline
Sandbox tour – Understanding Ambari
Sandbox Configuration & Overview
HDFS Data Ingestion (Lab)
Parallel Processing Basics
What is MapReduce
How MapReduce works
Introduction to Apache Hive
Hive Alignment with SQL
Hive Query Process
Hive Data Loading
Hive Managed Tables
Hive External Tables
Hive Table Location
Hive Bucketing & Partitioning
Apache Hive (Lab)
Hive Views & Hive use for XML
Hive Supported File Formats
Hive Data Model
Block Compression and Storage Formats in Hive
Built-In and External SerDes in Hive (Lab)
Hive complex data types (Array, Map, Struct)
Loading complex data in Hive (Lab)
Hive vs. Impala
Hadoop 1.0 vs. Hadoop 2.0
Introduction to YARN Architecture
YARN Resource Manager
YARN Node Manager
YARN Application Manager
YARN Performance Gauging
YARN Performance Measuring
YARN System Health
Resource Allocation in YARN
Containers Concept in Hadoop
YARN Queue Management and Container allocation (Lab)
Handling jobs in YARN Resource Manager UI
Project 01: Building a Sentiment Analysis Application to find the sentiment of tweets
Introduction to Apache Tez
Tez vs MapReduce
Introduction Apache Pig
Pig vs. Hive
Grunt Shell & PIG Scripting (Lab)
Loading Data in PIG
Debugging Using PIG
PIG Execution Modes
PIG Execution Mechanism
Pig integration with Hive – HCatalog
Introduction to Apache Sqoop
Sqoop Execution Modes
Migrating data with Sqoop (Lab)
Introduction to Streaming Analytics
Bounded data vs. Unbounded data
Spark as a stream processing engine
Streaming Analytics in Spark (Lab)
What are Messaging (Pub/Sub) systems
Introduction to Apache Kafka
Kafka – Core capabilities and Use cases
Topic, Partitions and Offsets
Kafka Producers and Consumers
Kafka as a messaging system (Lab)
Introduction to Data Flow
Apache Nifi as a Data Flow tool
Installing Nifi as a service (Lab)
Flow files, Processors and Connectors
Understanding Nifi UI and Creating data flows(Lab)
Project 02: Building a Real-Time data pipeline with Nifi, Kafka and Spark
Components of a Big data platform
Big Data Architectures
Lambda and Kappa Architecture
Building batch mode and real time big data pipelines – case studies (Lab)
Realm of NoSQL databases
NoSQL databases types
SQL vs. NoSQL
MongoDB as a NoSQL database
Up and running with MongoDB (Lab)
Following is price for this extensive training on Big Data
Recent graduates, third year and final year students from the computer science disciplines.
Professionals from the computer science domain who want to shift the profession to Big Data Analytics.
Executives who want to build the initial knowledge about the impact of the Big Data ecosystem on organization growth.
Duration: 7 weeks
Class Days: Saturdays & Sundays
Timings: 11:00 AM – 03:30 PM
Since our instructors are industry experts so they do train the students about practical world and also recommend the shinning students in industry for relevant positions.
Since our courses are led by Industry Experts so it is made sure that content covered in course is designed with hand on knowledge of more than 70-75 % along with supporting theory.
Don’t worry! We have got you covered. You shall be shared recorded lectures after each session, in case you want to revise your concepts or miss the lecture due to some personal or professional commitment.
For Big Data Professional course, you need to have Minimum Core i3 PC, 4th Generation with 12GB RAM and ideally Core i7, 5th Generation with 16GB RAM.
Yes, you will be awarded with a course completion certificate by Dice Analytics. We also keenly conduct an annual convocation for the appreciation and recognition of our students.
Fill the form to get yourself registered for the course