Big Data Certified Training - Dice Analytics

Big Data Certified Training

Learn Big Data from Industry Professionals

About Big Data Training

Getting intellectuals ready to become Big Data Experts!


During the Seven (7) weeks of the course, you will learn about the different ingredients of Big Data such as Hadoop, Spark, Pig, Hive, Sqoop etc.


In the subsequent weekly distribution of the course, participants will have hands-on experience on different pillars of the Big Data Ecosystem starting from parallel processing frameworks like Map Reduce & Spark, Distributed Storage techniques like HDFS, Big Data Administration Ambari etc.


At the end of the course candidates will have in-depth understanding & hands-on related to Big Data solutions like Cloudera & Horton works.

View Course Outline Reserve Your Seat



31st October 2020


7 weeks (Saturdays)


10:00AM to 06:00PM


Management House, 95 J/1, Johar Town, Lahore


Meet the trainer of this course who brings in extensive experience in Big Data domain!


Mr. Moeed Tariq

Big Data Analyst | Data Engineer | Trainer

Mr. Moeed Tariq has 6+ years of diversified experience in Telecom Business Support System, Business Intelligence & Commercial B2B departments. His expertise includes Big Data, Business Intelligence, Data Analytics, Data Modelling and Visualization, Hadoop, Apache Spark, Apache Kafka, HiveQL, IBM Cognos, NoSQL, MongoDB, RDBMS.

Course Outline

Week 1

  • What is Big Data?
  • The Big Data Era
  • Big Data – Data Sources
  • 4 V’s of Big Data
  • Conventional Data Warehouse Architecture
  • Modern Data Warehouse Architecture
  • What is Data Discovery?
  • Distributed Computing & its Advantage
  • Big Data Processing Frameworks (Hadoop, Apache Spark, NoSQL Databases)
  • What is Hadoop & its History?
  • Introduction to Apache Hadoop Stack (HDFS, MapReduce, Flume, Sqoop, Zookeeper, Ozie, HBase, Hive, Pig)
  • Introduction to Big data distributions (On-prem and cloud)
  • Components of Hadoop Cluster (Master Node, Data Node, Namenode, Job Tracker, Task Tracker)
  • Sandbox (virtual machine) Installation
  • Introduction to Hadoop Distributed File System (HDFS)
  • How HDFS Works
  • HDFS Block Size & Replication Factor
  • HDFS Read & Write pipeline
  • Sandbox tour – Understanding Ambari

Week 2

  • Sandbox Configuration & Overview
  • HDFS Commands
  • HDFS Data Ingestion (Lab)
  • Parallel Processing Basics
  • What is MapReduce
  • How MapReduce works
  • Introduction to Apache Hive
  • Hive Alignment with SQL
  • Hive Query Process
  • Hive Data Loading
  • Hive Managed Tables
  • Hive External Tables
  • Hive Table Location
  • Hive Bucketing & Partitioning
  • Apache Hive (Lab)
  • Hive Views
  • Hive use for XML
  • Hive Supported File Formats
  • Hive Data Model
  • Block Compression and Storage Formats in Hive

Week 3

  • Built-In and External SerDes in Hive (Lab)
  • Hive complex data types (Array, Map, Struct)
  • Loading complex data in Hive (Lab)
  • Hive vs. Impala
  • Impala Architecture
  • Hadoop 1.0 vs. Hadoop 2.0
  • Introduction to YARN Architecture
  • YARN Resource Manager
  • YARN Node Manager
  • YARN Application Manager
  • YARN Schedulers
  • YARN Performance Gauging
  • YARN Performance Measuring
  • YARN System Health
  • Resource Allocation in YARN
  • Containers Concept in Hadoop
  • YARN Queue Management and Container allocation (Lab)
  • Handling jobs in YARN Resource Manager UI
  • Project 01: Building a Sentiment Analysis Application to find the sentiment of tweets

Week 4

  • Introduction to Apache Tez
  • Tez vs MapReduce
  • Tez DAGs
  • Introduction Apache Pig
  • Pig vs. Hive
  • PIG Architecture
  • PIG-Latin
  • Grunt Shell & PIG Scripting (Lab)
  • PIG Commands
  • Loading Data in PIG
  • PIG Filter
  • PIG Joins
  • Debugging Using PIG
  • PIG Execution Modes
  • PIG Execution Mechanism
  • Pig integration with Hive – HCatalog
  • Introduction to Apache Sqoop
  • Sqoop Architecture
  • Sqoop Execution Modes
  • Migrating data with Sqoop (Lab)

Week 5

  • Introduction to Apache Spark
  • Spark vs. MapReduce
  • Spark Architecture
  • Spark Driver
  • Spark Context
  • Spark Executors
  • Spark Core Abstraction – RDDs, DataFrames, Datasets
  • Transformations vs. Actions
  • Spark Transformations (Map, Flatmap, Filter, Distinct)
  • Spark Actions (Collect, First, Take, Count, Reduce, Save-as-text)
  • Lazy Execution
  • SparkContext, HiveContext, SqlContext
  • Scala vs. Pyspark
  • Spark as a In memory processing engine (Lab)
  • Troubleshooting Jobs in Spark UI

Week 6

  • Introduction to Streaming Analytics
  • Bounded data vs. Unbounded data
  • Spark as a stream processing engine
  • Spark Streaming
  • Structured Streaming
  • Streaming Analytics in Spark (Lab)
  • What are Messaging (Pub/Sub) systems
  • Introduction to Apache Kafka
  • Kafka – Core capabilities and Use cases
  • Topic, Partitions and Offsets
  • Kafka Brokers
  • Kafka Producers and Consumers
  • Kafka as a messaging system (Lab)
  • Introduction to Data Flow
  • Apache Nifi as a Data Flow tool
  • Installing Nifi as a service (Lab)
  • Flow files, Processors and Connectors
  • Nifi Templates
  • Understanding Nifi UI and Creating data flows(Lab)

Week 7

  • Project 02: Building a Real-Time data pipeline with Nifi, Kafka and Spark
  • Components of a Big data platform
  • Big Data Architectures
  • Lambda and Kappa Architecture
  • Building batch mode and real time big data pipelines – case studies (Lab)
  • Realm of NoSQL databases
  • NoSQL databases types
  • SQL vs. NoSQL
  • MongoDB as a NoSQL database
  • Up and running with Mongodb
  • Next Steps


Following is price for this extensive training on Big Data

Price for an Individual Member
Rs 30,000 per person
Group of Two
7% Discount for a group of two people
Rs 27,900 per person
Group of Three
10% Discount for a group of Three people
Rs 27,000 per person
Group of Four
15% Discount for a group of Four people
Rs 25,500 per person



Frequently Asked Questions

Who should attend the course?

  • Recent graduates, third year and final year students from the computer science disciplines.
  • Professionals from the computer science domain who want to shift the profession to Big Data Analytics.
  • Executives who want to build the initial knowledge about the impact of the Big Data ecosystem on organization growth.

Who are the Instructors?

What is the timing of the course?

Duration: 7 weeks

Class Days: Saturdays only

Timings: 10:00 AM to 06:00 PM

Can I get a job after this course?

Since our instructors are industry experts so they do train the students about practical world and also recommend the shinning students in industry for relevant positions.

How much hands-on will be performed in this course?

Since our courses are led by Industry Experts so it is made sure that content covered in course is designed with hand on knowledge of more than 70-75 % along with supporting theory.

What are the PC requirements?

For Big Data Professional course, you need to have Minimum Core i3 PC, 4th Generation with 12GB RAM and ideally Core i7, 5th Generation with 16GB RAM.

Will I get a certificate after this course?

Yes, you will be awarded with a course completion certificate by Dice Analytics. We also keenly conduct an annual convocation for the appreciation and recognition of our students.

Reserve your Seat

You can reserve your seat  by filling the form below!

    Are you a: 
    StudentWorking Professional