Big Data Analytics - Dice Analytics

Big Data Analytics

An Interactive Live Training

Getting intellectuals ready to become Big Data Experts!

An Industry-Expert Led Live Training

During this interactive training on Zoom you will learn about the different ingredients of Big Data such as Hadoop, Spark, Pig, Hive & Sqoop.


Further you will have hands-on experience on different pillars of the Big Data Ecosystem starting from parallel processing frameworks like Map Reduce & Spark, Distributed Storage techniques like HDFS, Big Data Administration Ambari etc.


At the end the training you will have in-depth understanding & hands-on related to Big Data solutions like Cloudera & HortonWorks.

View Course Outline Reserve a Seat



18 March

Duration & Timings

8 Weeks (Sat & Sun)

10:00 AM – 3:00 PM


Urdu / Hindi

Seats Capacity

Limited seats!


Meet the trainer of this course.


Mr. Moeed Tariq

Manager Data Analytics| Cloud DWH Engineer | Big Data Trainer

Mr Moeed Tariq has 9+ years of diversified experience in Telecom, IT consultancy and video on demand (VOD) streaming companies in Pakistan and MENA region. His expertise includes Big Data, Cloud Computing (AWS, Azure), Data Analytics, Data Modelling, Hadoop, Apache Spark, Apache Kafka, HiveQL and NoSQL. He also acquire multiple certifications from Microsoft, AWS and Databricks. He is currently working with Gulf based OTT streaming company as Manager Data Analytics.



Course Outline



Week 1

  • What is Big Data?
  • The Big Data Era
  • Big Data – Data Sources
  • 4 V’s of Big Data
  • Conventional Data Warehouse Architecture
  • Modern Data Warehouse Architecture
  • What is Data Discovery?
  • Distributed Computing & its Advantage
  • Big Data Processing Frameworks (Hadoop, Apache Spark, NoSQL Databases)
  • What is Hadoop & its History?
  • Introduction to Apache Hadoop Stack (HDFS, MapReduce, Flume, Sqoop, Zookeeper, Ozie, HBase,
  • Hive, Pig)
  • Introduction to Big data distributions (On-prem and cloud)
  • Components of Hadoop Cluster (Master Node, Data Node, Namenode, Job Tracker, Task Tracker)
  • Sandbox (virtual machine) Installation
  • Introduction to Hadoop Distributed File System (HDFS)
  • How HDFS Works
  • HDFS Block Size & Replication Factor
  • HDFS Read & Write pipeline
  • Sandbox tour – Understanding Ambari
  • Dockerize Solution Installation

Week 2

  • Sandbox Configuration & Overview
  • HDFS Commands
  • HDFS Data Ingestion (Lab)
  • Parallel Processing Basics
  • What is MapReduce
  • How MapReduce works
  • Introduction to Apache Hive
  • Hive Alignment with SQL
  • Hive Query Process
  • Hive Data Loading
  • Hive Managed Tables
  • Hive External Tables
  • Hive Table Location
  • Hive Bucketing & Partitioning
  • Apache Hive (Lab)
  • Hive Views & Hive use for XML
  • Hive Supported File Formats
  • Hive Data Model
  • Block Compression and Storage Formats in Hive

Week 3

  • Built-In and External SerDes in Hive (Lab)
  • Hive complex data types (Array, Map, Struct)
  • Loading complex data in Hive (Lab)
  • Hive vs. Impala
  • Impala Architecture
  • Hadoop 1.0 vs. Hadoop 2.0
  • Introduction to YARN Architecture
  • YARN Resource Manager
  • YARN Node Manager
  • YARN Application Manager
  • YARN Schedulers
  • YARN Performance Gauging
  • YARN Performance Measuring
  • YARN System Health
  • Resource Allocation in YARN
  • Containers Concept in Hadoop
  • YARN Queue Management and Container allocation (Lab)
  • Handling jobs in YARN Resource Manager UI
  • Data Ingestion with Kafka-Coinfluent

Week 4

  • Project 01: Building a Sentiment Analysis Application to find the sentiment of tweets
  • introduction to Apache Tez
  • Tez vs MapReduce
  • Tez DAGs
  • Introduction Apache Pig
  • Pig vs. Hive
  • PIG Architecture
  • PIG-Latin
  • Grunt Shell & PIG Scripting (Lab)
  • PIG Commands
  • Loading Data in PIG
  • PIG Filter
  • PIG Joins
  • Debugging Using PIG
  • PIG Execution Modes
  • PIG Execution Mechanism
  • Pig integration with Hive – HCatalog

Week 5

  • Introduction to Apache Sqoop
  • Sqoop Architecture
  • Sqoop Execution Modes
  • Migrating data with Sqoop (Lab)
  • Introduction to Data Flow
  • Apache Nifi as a Data Flow tool
  • Installing Nifi as a service (Lab)
  • Flow files, Processors and Connectors
  • Nifi Templates
  • Understanding Nifi UI and Creating data flows (Lab)
  • Cloudera Intro (HUE, Impala & Cloudera Manager) & YARN

Week 6

  • Introduction to Apache Spark
  • Spark vs. MapReduce
  • Spark Architecture
  • Spark Driver
  • Spark Context
  • Spark Executors
  • Spark Core Abstraction – RDDs, DataFrames, Datasets
  • Transformations vs. Actions
  • Spark Transformations (Map, Flatmap, Filter, Distinct)
  • Spark Actions (Collect, First, Take, Count, Reduce, Save-as-text)
  • Lazy Execution
  • SparkContext, HiveContext, SqlContext
  • Scala vs. Pyspark
  • Spark as a In memory processing engine (Lab)
  • Troubleshooting Jobs in Spark UI

Week 7

  • Introduction to Streaming Analytics
  • Bounded data vs. Unbounded data
  • Spark as a stream processing engine
  • Spark Streaming
  • Structured Streaming
  • Streaming Analytics in Spark (Lab)
  • What are Messaging (Pub/Sub) systems
  • Introduction to Apache Kafka
  • Kafka – Core capabilities and Use cases
  • Topic, Partitions and Offsets
  • Kafka Brokers
  • Kafka Producers and Consumers
  • Kafka as a messaging system (Lab)
  • Intro to Databricks (Spark over cloud)
  • Databricks Deltalake Implementation/Medallion Architecture

Week 8

  • Components of a Big data platform
  • Big Data Architectures
  • Lambda and Kappa Architecture
  • Building batch mode and real time big data pipelines – case studies (Lab)
  • Realm of NoSQL databases
  • NoSQL databases types
  • SQL vs. NoSQL
  • MongoDB as a NoSQL database
  • Up and running with MongoDB (Lab)
  • Next Steps
  • Databricks Spark structure Streaming Implementation
  • Intro to NoSQL & ELK & casandara


Pricing Details


Online Banking details will be shared by our representatives after you reserve your seat

  • Individual Price
    • PKR 30,000 Per Person
    • Total charges for complete training
    • Book a seat
  • Group of Two
  • Group of Three

Reserve your Seat

You can reserve your seat  by filling the form below

    Are you a: 
    StudentWorking Professional




    Frequently Asked Questions

    Who should attend the course?

    Graduate or Masters Students with IT, CS or SE background who want to start their career in the Big Data Analytics domain

    People who are working in the Big Data Analytics domain and want to advance their career

    Executive who want to build a Big Data Analytics department in their start-ups/organizations

    What is the timing of the course?

    Duration: 8 weeks (Sat-Sun)
    Timings: 10AM – 3PM

    Who are the Instructors?

    How much hands-on will be performed in this course?

    Since our courses are led by Industry Experts so it is made sure that content covered in course is designed with hand on knowledge of more than 70-75 % along with supporting theory.

    What are the PC requirements?

    For Big Data Analytics Professional course, you need to have a PC with minimum 12GB SSD and 16GB RAM.

    Can i rejoin the training/workshop?

    Yes, you can rejoin the training within the span of an year of your registration. Please note following conditions in case you’re rejoining.
    1) There are only 5 seats specified for rejoiners in each iteration.
    2) These seats will be served on first come first basis.
    3) If you have not submitted your complete fee, you may not be able to rejoin. Your registration would be canceled.

    What if I miss any of the lectures?

    Don’t worry! We have got you covered. You shall be shared recorded lectures after each session, in case you want to revise your concepts or miss the lecture due to some personal or professional commitment.

    How will this training ensure hands-on practice?

    For executing the practical’s included in the Big Data Training, you will set-up tool on your machine. The installation manual for tool prep will be provided to help you install and set-up the required environment.

    What sort of projects will be part of this Live Training?

    This Certification Training course includes multiple real-time, industry-based projects, which will hone your skills as per current industry standards and prepare you for the future career needs.

    Will I get a certificate after this course?

    Yes, you will be awarded with a course completion certificate by Dice Analytics. We also keenly conduct an annual convocation for the appreciation and recognition of our students.

    Can I get a job after this course?

    Since our instructors are industry experts so they do train the students about practical world and also recommend the shinning students in industry for relevant positions.