Contact Us

Hide

Online course detail

Big Data Hadoop Training in Gurgaon, Delhi - Gyansetu

The Digital Universe is expected to reach 44 trillion gigabytes by 2020, we are churning out roughly 3 Quintilian bytes of data on daily basis. Our certified course will make you an expert in Big data Hadoop technology with strong command on HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain, AWS Cloud, Docker Kubernetes Overview for Deploying Big Data Applications.

Instructor Led Training  |  Free Course Repeat  |  Placement Assistance  |  Job Focused Projects  |  Interview Preparation Sessions

Read Reviews

Connect With Us

Curriculum

Gyansetu certified course on Big Data Hadoop is intended to start from basics and move gradually towards advancement, to eventually gain working command on Big Data analytics. We understand Big Data can be a daunting course and hence we at Gyansetu have d

    This module will help you understand how to configure Hadoop Cluster on AWS Cloud:

    1. Introduction to Amazon Elastic MapReduce
    2. AWS EMR Cluster
    3. AWS EC2 Instance: Multi Node Cluster Configuration
    4. AWS EMR Architecture
    5. Web Interfaces on Amazon EMR
    6. Amazon S3
    7. Executing MapReduce Job on EC2 & EMR
    8. Apache Spark on AWS, EC2 & EMR
    9. Submitting Spark Job on AWS
    10. Hive on EMR
    11. Available Storage types: S3, RDS & DynamoDB
    12. Apache Pig on AWS EMR
    13. Processing NY Taxi Data using SPARK on Amazon EMR

    This module will help you understand Big Data:

    1. Common Hadoop ecosystem components
    2. Hadoop Architecture
    3. HDFS Architecture
    4. Anatomy of File Write and Read
    5. How MapReduce Framework works
    6. Hadoop high level Architecture
    7. MR2 Architecture
    8. Hadoop YARN
    9. Hadoop 2.x core components
    10. Hadoop Distributions
    11. Hadoop Cluster Formation

    This module will help you to understand Hadoop & HDFS Cluster Architecture:

    1. Configuration files in Hadoop Cluster (FSimage & editlog file)
    2. Setting up of Single & Multi node Hadoop Cluster
    3. HDFS File permissions
    4. HDFS Installation & Shell Commands
    5. Deamons of HDFS
      1. Node Manager
      2. Resource Manager
      3. NameNode
      4. DataNode
      5. Secondary NameNode
      6.  YARN Deamons
      7. HDFS Read & Write Commands
      8. NameNode & DataNode Architecture
      9. HDFS Operations
      10. Hadoop MapReduce Job
      11. Executing MapReduce Job

    This module will help you to understand Hadoop MapReduce framework:

    1. How MapReduce works on HDFS data sets
    2. MapReduce Algorithm
    3. MapReduce Hadoop Implementation
    4. Hadoop 2.x MapReduce Architecture
    5. MapReduce Components
    6. YARN Workflow
    7. MapReduce Combiners
    8. MapReduce Partitioners
    9. MapReduce Hadoop Administration
    10. MapReduce APIs
    11. Input Split & String Tokenizer in MapReduce
    12. MapReduce Use Cases on Data sets
    1. Job Submission & Monitoring
    2. Counters
    3. Distributed Cache
    4. Map & Reduce Join
    5. Data Compressors
    6. Job Configuration
    7. Record Reader
    1. Hive
    2. Sqoop (Data Ingestion tool)
    3. Map Reduce
    4. Pig
    1. Hive
    2. Sqoop (Data Ingestion tool)
    3. Map Reduce
    4. Pig
    1. Hive Installation
    2. Hive Data types
    3. Hive Architecture & Components
    4. Hive Meta Store
    5. Hive Tables(Managed Tables and External Tables)
    6. Hive Partitioning & Bucketing
    7. Hive Joins & Sub Query
    8. Running Hive Scripts
    9. Hive Indexing & View
    10. Hive Queries (HQL); Order By, Group By, Distribute By, Cluster By, Examples
    11. Hive Functions: Built-in & UDF (User Defined Functions)
    12. Hive ETL: Loading JSON, XML, Text Data Examples
    13. Hive Querying Data
    14. Hive Tables (Managed & External Tables)
    15. Hive Used Cases
    16. Hive Optimization Techniques
      1. Partioning(Static & Dynamic Partition) & Bucketing
      2. Hive Joins > Map + BucketMap + SMB (SortedBucketMap) + Skew
      3. Hive FileFormats ( ORC+SEQUENCE+TEXT+AVRO+PARQUET)
      4. CBO
      5. Vectorization
      6. Indexing (Compact + BitMap)
      7. Integration with TEZ & Spark
    17. Hive SerDer ( Custom + InBuilt)
    18. Hive integration NoSQL (HBase + MongoDB + Cassandra)
    19. Thrift API (Thrift Server)
    20. Hive LATERAL VIEW
    21. Incremental Updates & Import in Hive  Hive Functions: 
      1.  LATERAL VIEW EXPLODE   
      2. 2) LATERAL VIEW JSON_TUPLE ...........others...
    22. Hive SCD Strategies :1) Type - 1      2) Type – 2         3) TYpe - 3
    23. UDF, UDTF & UDAF
    24. Hive Multiple Delimiters
    25. XML & JSON Data Loading HIVE.
    26. Aggregation & Windowing Functions in Hive
    27. Hive integration NoSQL(HBase + MongoDB + Cassandra)
    28. Hive Connect with Tableau
    1. Sqoop Installation
    2. Loading Data form RDBMS using Sqoop
    3. Fundamentals & Architecture of Apache Sqoop
    4. Sqoop Tools
      1. Sqoop Import & Import-All-Table
      2. Sqoop Job
      3. Sqoop Codegen
      4. Sqoop Incremental Import & Incremental Export
      5. Sqoop  Merge
      6. Sqoop : Hive Import
      7. Sqoop Metastore
      8. Sqoop Export
    5. Import Data from MySQL to Hive using Sqoop
    6. Sqoop: Hive Import
    7. Sqoop Metastore
    8. Sqoop Use Cases
    9. Sqoop- HCatalog Integration
    10. Sqoop Script
    11. Sqoop Connectors
    12. Batch Processing in Sqoop
    13. SQOOP Incremental Import
    14. Boundary Queries in Sqoop
    15. Controlling Parallelism in Sqoop
    16. Import Join Tables from SQL databases to Warehouse using Sqoop
    17. Sqoop Hive/HBase/HDFS integration
    1. Pig Architecture
    2. Pig Installation
    3. Pig Grunt shell
    4. Pig Running Modes
    5. Pig Latin Basics
    6. Pig LOAD & STORE Operators
    7. Diagnostic Operators
      1. DESCRIBE Operator
      2. EXPLAIN Operator
      3. ILLUSTRATE Operator
      4. DUMP Operator
    8. Grouping & Joining
      1. GROUP Operator
      2. COGROUP Operator
      3. JOIN Operator
      4. CROSS Operator
    9. Combining & Splitting
      1. UNION Operator
      2. SPLIT Operator
    10. Filtering
      1. FILTER Operator
      2. DISTINCT Operator
      3. FOREACH Operator
    11. Sorting
      1. ORDERBYFIRST
      2. LIMIT Operator
    12. Built in Fuctions
      1. EVAL Functions
      2. LOAD & STORE Functions
      3. Bag & Tuple Functions
      4. String Functions
      5. Date-Time Functions
      6. MATH Functions
    13. Pig UDFs (User Defined Functions)
    14. Pig Scripts in Local Mode
    15. Pig Scripts in MapReduce Mode
    16. Analysing XML Data using Pig
    17. Pig Use Cases (Data Analysis on Social Media sites, Banking, Stock Market & Others)
    18. Analysing JSON data using Pig
    19. Testing Pig Sctipts
    1. Flume Introduction
    2. Flume Architecture
    3. Flume Data Flow
    4. Flume Configuration
    5. Flume Agent Component Types
    6. Flume Setup
    7. Flume Interceptors
    8. Multiplexing (Fan-Out), Fan-In-Flow
    9. Flume Channel Selectors
    10. Flume Sync Processors
    11. Fetching of Streaming Data using Flume (Social Media Sites: YouTube, LinkedIn, Twitter)
    12. Flume + Kafka Integration
    13. Flume Use Cases
    1. Kafka Fundamentals
    2. Kafka Cluster Architecture
    3. Kafka Workflow
    4. Kafka Producer, Consumer Architecture
    5. Kafka as PUB/SUB model
    6. KAFKA Terminologonliineies / Core APIs:


    1. Producer / Publishers
    2. Consumer / Subscribers
    3. Input Offsets
    4. Topic
    5. Topic Log
    6. Replication
    7. Retention
    8. Consumer Groups
    9. Leader
    10. Follower
    11. Mirror Maker
    12. Broker
    13. Topic Partition
    14. Kafka Retention Policy
    1. KAFKA Confluent HUB
    2. KAFKA Confluent Cloud
    3. KStream APIs
    4. Difference between Apache KAFKA / Confluence KAFKA
    5. KSQL (SQL Engine for Kafka)
    6. Developing Real-time application using KStream APIs
    7. KSQL (SQL Engine for Kafka)
    8. Kafka Connectors
    9. Kafka REST Proxy
    10. Kafka Offsets
    1. Oozie Introduction
    2. Oozie Workflow Specification
    3. Oozie Coordinator Functional Specification
    4. Oozie H-catalog Integration
    5. Oozie Bundle Jobs
    6. Oozie CLI Extensions
    7. Automate MapReduce, Pig, Hive, Sqoop Jobs using Oozie
    8. Packaging & Deploying an Oozie Workflow Application
    1. Apache Airflow Installation
    2. Work Flow Design using Airflow
    3. Airflow DAG
    4. Module Import in Airflow
    5. Airflow Applications
    6. Docker Airflow
    7. Airflow Pipelines
    8. Airflow KUBERNETES Integration
    9. Automating Batch & Real Time Jobs using Airflow
    10. Data Profiling using Airflow
    11. Airflow Integration:
      1. AWS EMR
      2. AWS S3
      3. AWS Redshift
      4. AWS DynamoDB
      5. AWS Lambda
      6. AWS Kines
    12. Scheduling of PySpark Jobs using Airflow
    13. Airflow Orchestration
    14. Airflow Schedulers & Triggers
    15. Gantt Chart in Apache Airflow
    16. Executors in Apache Airflow
    17. Airflow Metrices
    1. HBase Architecture, Data Flow & Use Cases
    2. Apache HBase Configuration
    3. HBase Shell & general commands
    4. HBase Schema Design
    5. HBase Data Model
    6. HBase Region & Master Server
    7. HBase & MapReduce
    8. Bulk Loading in HBase
    9. Create, Insert, Read Tables in HBase
    10. HBase Admin APIs
    11. HBase Security
    12. HBase vs Hive
    13. Backup & Restore in HBase
    14. Apache HBase External APIs (REST, Thrift, Scala)
    15. HBase & SPARK
    16. Apache HBase Coprocessors
    17. HBase Case Studies
    18. HBase Trobleshooting
    • Cassandra Installation
    • CASSANDRA ARCHITECTURE LAYERS & ITS RELATED COMPONENTS
    • Cassandra Configuration
    • Operating Cassandra
    • Cassandra Tools
      • Cqlsh
      • Nodetool
      • SSTables
      • Cassandra Stress
    • Partitioners in Cassandra BLOOM FILTERS
    • Tunning Cassandra Performance
    • Read/Write Cassandra
    • Cassandra Queries (CQL)
    • CASSANDRA COMPACTION STRATEGIES

     ???????

    1. Spark RDDs Actions & Transformations.
    2. Spark SQL : Connectivity with various Relational sources & its convert it into Data Frame using Spark SQL.
    3. Spark Streaming
    4. Understanding role of RDD
    5. Spark Core concepts : Creating of RDDs: Parrallel RDDs, MappedRDD, HadoopRDD, JdbcRDD.
    6. Spark Architecture & Components.


      • AWS Lambda:
        • AWS Lambda Introduction
        • Creating Data Pipelines using AWS Lambda & Kinesis
        • AWS Lambda Functions
        • AWS Lambda Deployment


      • AWS GLUE :
        • GLUE Context
        • AWS Data Catalog
        • AWS Athena
        • AWS Quiksight


      • AWS Kinesis
      • AWS S3
      • AWS Redshift
      • AWS EMR & EC2
      • AWS ECR & AWS Kubernetes
    1. How to manage & Monitor Apache Spark on Kubernetes
    2. Spark Submit Vs Kubernetes Operator
    3. How Spark Submit works with Kubernetes
    4. How Kubernetes Operator for Spark Works.
    5. Setting up of Hadoop Cluster on Docker
    6. Deploying IMR , Sqoop & Hive Jobs inside Hadoop Dockerized environment.

    1) Docker Installation

    2) Docker Hub

    3) Docker Images

    4) Docker Containers & Shells

    5) Working with Docker Containers

    6) Docker Architecture

    7) Docker Push & Pull containers

    8) Docker Container & Hosts

    9) Docker Configuration

    10) Docker Files (DockerFile)

    11) Docker Building Files

    12) Docker Public Repositories

    13) Docker Private Registeries

    14) Building WebServer using DockerFile

    15) Docker Commands

    16) Docker Container Linking?

    17) Docker Storage

    18) Docker Networking

    19) Docker Cloud

    20) Docker Logging

    21) Docker Compose

    22) Docker Continuous Integration

    23) Docker Kubernetes Integration

    24) Docker Working of Kubernetes

    25) Docker on AWS

    1) Overview

    2) Learn Kubernetes Basics

    3) Kubernetes Installation

    4) Kubernetes Architecture

    5) Kubernetes Master Server Components

    a) etcd

    b) kube-apiserver

    c) kube-controller-manager

    d) kube-scheduler

    e) cloud-controller-manager


    6) Kubernetes Node Server Components

    a) A container runtime

    b) kubelet

    c) kube-proxy

    d) kube-scheduler

    e) cloud-controller-manager


    7) Kubernetes Objects & Workloads

    a) Kubernetes Pods

    b) Kubernetes Replication Controller & Replica Sets


    8) Kubernetes Images

    9) Kubernetes Labels & Selectors

    10) Kubernetes Namespace

    11) Kubernetes Service

    12) Kubernetes Deployments

    a) Stateful Sets

    b) Daemon Sets

    c) Jobs & Cron Jobs


    13) Other Kubernetes Components:

    a) Services

    b) Volume & Persistent Volumes

    c) Lables, Selectors & Annotations

    d) Kubernetes Secrets

    e) Kubernetes Network Policy


    14) Kubernetes API

    15) Kubernetes Kubectl

    16) Kubernetes Kubectl Commands

    17) Kubernetes Creating an App

    18) Kubernetes App Deployment

    19) Kubernetes Autoscaling

    20) Kubernetes Dashboard Setup

    21) Kubernetes Monitoring

    22) Federation using kubefed

Course Description

    Apache Hadoop is a collection of network of multiple computers involved in solving and computing tremendous amount of data. Data Storage in Hadoop is done in a distributed file system, known as HDFS that provides very high bandwidth through different clusters.

    Gyansetu certified course on Big Data Hadoop is intended to start from basics and move gradually towards advancement, to eventually gain working command on Big Data analytics. We understand Big Data can be a daunting course and hence we at Gyansetu have divided it into easily understandable format that covers all possible aspects of big data.

    Gyansetu Big Data Training in Gurgaon will help you to understand core principles in Big Data Analytics and to gain core expertise in analysis of large datasets from various sources:

    1. Concepts of MapReduce framework & HDFS filesystem
    2. Setting up of Single & Multi-Node Hadoop cluster
    3. Understanding HDFS architecture
    4. Writing MapReduce programs & logic
    5. Learn Data Loading using Sqoop from structured sources
    6. Understanding Flume Configuration used for data loading
    7. Data Analytics using Pig
    8. Understanding hive for data analytics
    9. Scheduling MapReduce, Pig, Hive, Sqoop Jobs using Oozie
    10. Understanding Kafka messaging system
    11. MapReduce and HBase Integration
    12. Spark Introduction

    Our Big Data Experts have realized that Learning Hadoop standalone doesn’t qualify candidates to clear the interview process. Interviewers demand and expectations from the candidates are more nowadays. They expect proficiency in advanced concepts like-

    • Expertise in PySpark/ Scala-Spark
    • Real Time Data Storage inside Data Lake (Real Time ETL)
    • Big Data Related Services on AWS Cloud
    • Deploying BIG Data application in Production environment using Docker & Kubernetes
    • Experience in Real Time Big Data Projects


    All the advanced level topics will be covered at Gyansetu in a classroom/online Instructor led mode with recordings.

    Knowledge of Java, SQL is good to start Hadoop Training in Gurgaon. However, Gyansetu offers a complementary instructor led course on Java & SQL before you start Hadoop course.

    Gyansetu is providing complimentary placement service to all students. Gyansetu Placement Team consistently work on industry collaboration and associations which help our students to find their dream job right after the completion of training.

    • Our placement team will add Big Data skills & projects in your CV and update your profile on Job search engines like Naukri, Indeed, Monster, etc. This will increase your profile visibility in top recruiter search and ultimately increase interview calls by 5x.
    • Our faculty offers extended support to students by clearing doubts faced during the interview and preparing them for the upcoming interviews.
    • Gyansetu’s Students are currently working in Companies like Sapient, Capgemini, TCS, Sopra, HCL, Birlasoft, Wipro, Accenture, Zomato, Ola Cabs, Oyo Rooms, etc.



    • Gyansetu trainer’s are well known in Industry; who are highly qualified and currently working in top MNCs.
    • We provide interaction with faculty before the course starts.
    • Our experts help students in learning Technology from basics, even if you are not good in basic programming skills, don’t worry! We will help you.
    • Faculties will help you in preparing project reports & presentations.
    • Students will be provided Mentoring sessions by Experts.

Certification

Big Data Hadoop Certification Program

APPLY NOW

Reviews

Placement

Enroll Now

Structure your learning and get a certificate to prove it.

Projects

    Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Amazon AWS, Elastic Search, Zookeeper

    Tools & Techniques used :  PySpark MLIB,Spark Streaming, Python (Jupiter Notebook, Anaconda), Machine Learning packages: Numpy, Pandas, Matplot, Seaborn, Sklearn ,Random forest and Gradient Boost, Confusing matrix Tableau

    Description : Build a predictive model which will predict fraud transaction on PLCC &DC cards on daily bases. This includes data extraction then data cleaning followed by data pre processing.

    • Pre processing includes standard scaling, means normalizing the data followed by cross validation techniques to check the compatibility of the data.
    • In data modeling, using Decision Tree with Random forest and Gradient Boost hyper parameter tuning techniques to tune our model.
    • In the end, evaluating the mode, by measuring confusion matrix with accuracy of 98% and a trained model, which will show all the fraud transaction on PLCC & DC cards on tableau dashboard.

    Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive

    Tools & Techniques used :  Hadoop+HBase+Spark+Flink+Beam+ML stack, Docker & KUBERNETES, Kafka, MongoDB, AVRO, Parquet

    Description : The aim is to create a Batch/Streaming/ML/WebApp stack where you can test your jobs locally or to submit them to the Yarn resource manager. We are using Docker to build the environment and Docker-Compose to provision it with the required components (Next step using Kubernetes). Along with the infrastructure, We are check that it works with 4 projects that just probes everything is working as expected. The boilerplate is based on a sample search flight Web application.

Big Data Hadoop Training in Gurgaon, Delhi - Gyansetu Features

FAQs

    We have seen getting a relevant interview call is not a big challenge in your case. Our placement team consistently works on industry collaboration and associations which help our students to find their dream job right after the completion of training. We help you prepare your CV by adding relevant projects and skills once 80% of the course is completed. Our placement team will update your profile on Job Portals, this increases relevant interview calls by 5x.

    Interview selection depends on your knowledge and learning. As per the past trend, initial 5 interviews is a learning experience of

    • What type of technical questions are asked in interviews?
    • What are their expectations?
    • How should you prepare?


    Our faculty team will constantly support you during interviews. Usually, students get job after appearing in 6-7 interviews.

    We have seen getting a technical interview call is a challenge at times. Most of the time you receive sales job calls/ backend job calls/ BPO job calls. No Worries!! Our Placement team will prepare your CV in such a way that you will have a good number of technical interview calls. We will provide you interview preparation sessions and make you job ready. Our placement team consistently works on industry collaboration and associations which help our students to find their dream job right after the completion of training. Our placement team will update your profile on Job Portals, this increases relevant interview call by 3x.

    Interview selection depends on your knowledge and learning. As per the past trend, initial 8 interviews is a learning experience of

    • What type of technical questions are asked in interviews?
    • What are their expectations?
    • How should you prepare?



    Our faculty team will constantly support you during interviews. Usually, students get job after appearing in 9-10 interviews.


    We have seen getting a technical interview call is hardly possible. Gyansetu provides internship opportunities to the non-working students so they have some industry exposure before they appear in interviews. Internship experience adds a lot of value to your CV and our placement team will prepare your CV in such a way that you will have a good number of interview calls. We will provide you interview preparation sessions and make you job ready. Our placement team consistently works on industry collaboration and associations which help our students to find their dream job right after the completion of training and we will update your profile on Job Portals, this increases relevant interview call by 3x.

    Interview selection depends on your knowledge and learning. As per the past trend, initial 8 interviews is a learning experience of

    • What type of technical questions are asked in interviews?
    • What are their expectations?
    • How should you prepare?


    Our faculty team will constantly support you during interviews. Usually, students get job after appearing in 9-10 interviews.

    Yes, a one-to-one faculty discussion and demo session will be provided before admission. We understand the importance of trust between you and the trainer. We will be happy if you clear all your queries before you start classes with us.

    We understand the importance of every session. Sessions recording will be shared with you and in case of any query, faculty will give you extra time to answer your queries.

    Yes, we understand that self-learning is most crucial and for the same we provide students with PPTs, PDFs, class recordings, lab sessions, etc, so that a student can get a good handle of these topics.

    We provide an option to retake the course within 3 months from the completion of your course, so that you get more time to learn the concepts and do the best in your interviews.

    We believe in the concept that having less students is the best way to pay attention to each student individually and for the same our batch size varies between 5-10 people.

    Yes, we have batches available on weekends. We understand many students are in jobs and it's difficult to take time for training on weekdays. Batch timings need to be checked with our counsellors.

    Yes, we have batches available on weekdays but in limited time slots. Since most of our trainers are working, so either the batches are available in morning hours or in the evening hours. You need to contact our counsellors to know more on this.

    Total duration of the Hadoop course is 80 hours (40 Hours of live instructor led training and 40 hours of self paced learning).

    You don’t need to pay anyone for software installation, our faculties will provide you all the required software and will assist you in the complete installation process.

    Our faculties will help you in resolving your queries during and after the course.

Relevant interested Courses