Big Data Hadoop Training Gurgaon, Delhi Course Details
Gyansetu Big Data Training Gurgaon will help you to understand core principles in Big Data Analytics also help us to gain core expertise in analysis of
large datasets from various sources:
1. Concepts of MapReduce framework & HDFS filesystem.
2. Setting up of Single & Multi-Node Hadoop cluster.
3. Understanding HDFS architecture.
4. Writing MapReduce programs & logic
5. Learn Data Loading using Sqoop from structured sources.
6. Understanding Flume Configuration used for data loading.
7. Data Analytics using Pig.
8. Understanding hive for data analytics.
9. Scheduling MapReduce,Pig,Hive,Sqoop Jobs using Oozie
10.Understanding Kafka messaging system.
11.MapReduce and HBase Integration.
13.Understanding Spark Ecosystem.
14.RDD Actions & Transformations.
15.Understanding Spark Architecture.
16.Spark SQL & Streaming Modules in Spark Ecosystem.
17.Live Projects on Big Data Analytics.
Who should go for Hadoop Course?
Big Data market growing rapidly & data size is increasing day by day and IT needs expert Big Data Professionals in coming years . It will be helpful for persons working in IT as:
1. Testing professionals
2. Senior IT Professionals
3. BI /ETL/DW professionals
4. Developers and Architects
5. Mainframe professionals
Pre-requisites for the Big Data Hadoop Training Course?
There will be no pre-requisites but Knowledge of Java,Python & SQL will be beneficial, but not mandatory. Gyansetu provides a crash course for pre-requisites required to initiate Big Data training.
After completion of course , you will be able to analyze Large Datasets & will work on a live project using PIG,HBase,HIVE & MapReduce to perform Analysis.
We will work on case studies related to domains like Finance, Media, Media, Stocks & more.
Problem Statement : Fetch structured & unstructured data sets from various sources like Social Media Sites, Web Server & structured source like MySQL, Oracle & others
and dump it into HDFS and then analyze the same datasets using PIG,HQL queries & MapReduce technologies to gain proficiency in Hadoop related stack & its ecosystem tools.
Data Analysis Steps in :
1. Dump XML & JSON datasets into HDFS.
2. Convert semi-structured data formats(JSON & XML) into structured format using Pig,Hive & MapReduce.
3. Push the data set into PIG & Hive environment for further analysis.
4. Writing Hive queries to push the output into relational database(RDBMS) using Sqoop.
5. Renders the result in Box Plot, Bar Graph & others using R & Python integration with Hadoop.
Project #2: Analyze Stock Market Data
Data : Data set contains stock information such as daily quotes ,Stock highest price, Stock opening price on New York Stock Exchange.
Problem Statement: Calculate Co-variance for stock data to solve storage & processing problems related to huge volume of data.
a)Positive Covariance, If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance.
b)Negative Co-variance, If return move inversely,If investment tends to be up while other is down, this shows Negative Co-variance.
Problem Statement: What was the busiest dispatch base by trips for a particular day on entire month?
1. Top 20 destinations tourist frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
2. Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations.
3. Top 20 locations from where most of the trips start based on booked trip count.
diverted routes & others.
Problem Statement: Analyze Flight Data to:
1. List of Delayed flights.
2. Find flights with zero stop.
3. List of Active Airlines all countries.
4. Source & Destination details of flights.
5. Reason why flight get delayed.
6. Time in different formats.
Problem Statement: Analyze the movie ratings by different users to:
1. Get the user who has rated the most number of movies
2. Get the user who has rated the least number of movies
3. Get the count of total number of movies rated by user belonging to a specific occupation
4. Get the number of underage users
Data: DataSet Columns : VideoId, Uploader, Internal Day of establishment of You tube & the date of uploading of the video,Category,Length,Rating, Number of comments.
Problem Statement: Top 5 categories with maximum number of videos uploaded.
Problem Statement: Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos.