Hadoop

DURATION  60 hrs
CERTIFICATION
YES

Description

We are AUTHORISED APACHE HADOOP HORTONWORKS SPARK SCALA CERTIFICATION PARTNERS. Training by Cloudera & Hortonworks Certified Professional having hands on experience and Real time working professional.

Money back Guarantee with 15% interest if not satisfied with quality of training

Big Data/Hadoop Training:

Pre-requisites: Knowledge of Core Java/Oracle; Basics of Unix

  1. Introduction to Big Data & Hadoop
    • Importance of Data & Data Analysis
    • What is Big Data?
    • Big Data & its hype
    • Big Data Users & Scenarios
    • Structured vs Unstructured Data
    • Challenges of Big Data
    • How to overcome the challenges?
    • Divide & Conquer philosophy
    • Overview of Hadoop
  1. Hadoop and its file system – HDFS
    • History of Hadoop
    • Hadoop Ecosystem
    • Hadoop Animal Planet
    • What is Hadoop?
    • Key Distinctions of Hadoop
    • Hadoop Components
    • HDFS
    • MapReduce
    • Why Distributed File System?
    • The Design of HDFS
    • Hadoop Distributed File System
    • What is a HDFS block?
    • Why HDFS block is so large in HDFS?
    • NameNode
    • DataNode
    • Secondary NameNode
    • A file in HDFS
    • Hadoop Components/Architecture
    • NameNode, JobTracker, DataNode, TaskTracker & Secondary Namenode
    • Understanding Storage components(NameNode, DataNode & Secondary Namenode)
    • Understanding Processing components(JobTracker & TaskTracker)
    • How Secondary Namenode overcomes the failure of the primary Namenode
    • Anatomy of a File Read
    • Anatomy of a File Write
  1. Understanding Hadoop Cluster
    • Walkthrough of CDH VM setup
    • Hadoop Cluster modes
    • Standalone Mode
    • Pseudo-Distributed Mode
    • Distributed Mode
    • Hadoop Configuration files
    • core-site.xml
    • mapred-site.xml
    • hdfs-site.xml
    • yarn-site.xml
    • Understanding Cluster configuration
  1. MapReduce
    • Meet MapReduce
    • WordCount algorithm – Traditional approach
    • Traditional approach on a Distributed system& it’s drawbacks
    • MapReduce approach
    • Input & Output Forms of a MR program
    • Hadoop Data types
    • Map, Shuffle & Sort, Reduce Phases
    • Workflow & Transformation of Data
    • Word Count Code walkthrough
    • Input Split & HDFS Block
    • Relation between Split & Block
    • MR Flow with Single Reduce Task
    • MR flow with multiple Reducers
    • Data locality Optimization
    • Speculative Execution
    • Combiner
    • Partitioner
  1. Advanced MapReduce
    • Counters
    • InputFormat & its hierarchy
    • OutputFormat & its hierarchy
    • Using Compression techniques
    • Side Data Distribution – Distributed Cache
    • Joins
    • Map side join using Distributed Cache
    • Reduce side Join
    • Secondary Sorting
    • MR Unit – An Unit testing framework
  1. Pig
    • What is Pig?
    • Why Pig?
    • Pig vs Sql
    • Execution Types or Modes
    • Running Pig
    • Pig Data types
    • Pig Latin relational Operators
    • Multi Query execution
    • Pig Latin Diagnostic Operators
    • Pig Latin Macro & UDF statements
    • Pig Latin Commands
    • Pig Latin Expressions
    • Schemas
    • Pig Functions
    • Pig Latin File Loaders
    • Pig UDF & executing a Pig UDF
    • Pig Use cases
  1. Hive
    • Introduction to Hive
    • Pig vs. Hive
    • Hive Limitations & Possibilities
    • Hive Architecture
    • Metastore
    • Hive Data Organization
    • Hive QL
    • Sql vs. Hive QL
    • Hive Data types
    • Data Storage
    • Managed & External Tables
    • Partitions & Buckets
    • Static Partitioning & Dynamic Partitioning
    • Storage Formats
    • File Formats – Sequence File & RC File
    • Using Compression in Hive
    • Built-in Serdes
    • Importing Data (Using Load Data & Insert Into)
    • Alter & Drop Commands
    • Data Querying
    • Using MR Scripts
    • Hive Joins
    • Sub Queries
    • Views
  1. HBase
    • Introduction to NoSql & HBase
    • HBase vs. RDBMS
    • HBase Use cases
    • Row & Column oriented storage
    • Characteristics of a huge DB
    • What is HBase?
    • HBase Data-Model
    • HBase logical model & physical storage
    • HBase architecture
    • HBase in operation (put, get, scan & delete)
    • Loading Data into HBase
    • HBase shell commands
    • HBase operations through Java
    • HBase operations through MR
  1. ZooKeeper & Oozie
    • Introduction to Zookeeper
    • Distributed Coordination
    • Zookeeper Data Model
    • Zookeeper Service
    • Introduction to Zookeeper
    • Distributed Coordination
    • Zookeeper Data Model
    • Zookeeper Service
  1. Sqoop
    • Introduction to Sqoop
    • Sqoop design
    • Sqoop basic Commands
    • Sqoop Table Import flow of execution
    • Sqoop Import Commands – to HDFS, Hive & HBase tables
    • Sqoop Incremental Import
    • Incremental Append
    • Incremental Last Modified
    • Sqoop export flow of execution
    • Sqoop Export Command
  1. Flume
    • Flume Architecture
    • Flume Components
    • Streaming live Twitter data with Flume
  1. Hadoop 2.0 & YARN
    • Hadoop 1 Limitations
    • HDFS Federation
    • NameNode High Availability
    • Introduction to YARN
    • YARN Applications
    • YARN Architecture
    • Anatomy of an YARN application
  1. MongoDB
  1. Spark Overview
    • What is Spark?
    • Why Spark?
    • Spark & Big Data
    • Spark Components
    • Resilient Distributed Data sets
    • Data Operations on RDD
    • Spark Libraries

JAVA (15 HRS) – To the extent required for MAP Reduce (Complimentary)

Highlights of the Course:

  • Teaching is oriented towards –
    • Practical oriented & Hands on
    • clear understanding of basics
    • what to expect as an interview question while topic discussion
  • Exclusive Access to a variety of latest interview questions and answers
  • Work on real-time projects(in all tools like – Pig, Hive, Mapreduce & HBase)
  • Certification guidance & Material
  • Hand-outs will be given which would serve as a knowledge-check
  • Assistance in Resume preparation
  • Interviews guidance
  • Corporate level Training
  • Finally, this training gives you all that are needed to secure a desired job & keeps you get going in your job!