BIG DATA

Category: Uncategorized

Description

Description

Real time idea of Hadoop Development

Ø In-depth understanding of Entire Big Data Hadoop and Hadoop Ecosystem

Detailed Course Materials
Free Core Java and UNIX Fundamentals
Interview Oriented Discussions
Get Ready for Hadoop & Spark Developer (CCA175) Certification Exam

UNIX/LINUX Basic Commands Basic UNIX Shell Scripting

Basic Java Programming – Core JAVA OOPS Concepts

Introduction to Big Data and Hadoop

Working With HDFS

Hadoop Map Reduce Concepts & Features Developing Map Reduce Applications

Hadoop Eco System Components:

HIVE
PIG
HBASE
FLUME
SQOOP
Zookeeper

Detailed SPARK with SCALA Programming Detailed Kafka Streaming

Overview of MongoDB

Overview of Spark with Python Programming (Time Permitting)

Real Time Tools like Putty, WinSCP, Eclipse, Hue, Cloudera Manager

Basic SQL Knowledge
Computer with Minimum 4GB RAM (8GM RAM Preferred)
Basic UNIX & Java Programming knowledge is added advantage

Detailed Course Structure:

Introduction to Big Data & Hadoop

The Big Data Problem
What is Big Data?
Challenges in processing Big Data
What is Hadoop?
Why Hadoop?
History of Hadoop
Hadoop Components Overview
- HDFS
- Map Reduce
Hadoop Eco System Introduction
NoSQL Database Introduction

Understanding Hadoop Architecture

Hadoop 2.x Architecture
Introduction to YARN
Hadoop Daemons
YARN Architecture
- Resource Manager
- Application Master
- Node Manager

Introduction to HDFS (Hadoop Distributed File System)

Rack Awareness
HDFS Daemons
Writing Files to HDFS
- Blocks & Splits
- Input Splits
- Data Replication
Reading Files from HDFS
Introduction to HDFS Configuration Files

Working with HDFS

HDFS Commands
Accessing HDFS
- CLI Approach
- JAVA Approach [Introducing HDFS JAVA API]

Introduction to Map Reduce Paradigm

What is Map Reduce?
Detailed Map Reduce Flow
- Introduction to Key/Value Approach
- Detailed Mapper Functionality
- Detailed Reducer Functionality
- Details of Partitioner
- Shuffle & Sort Process
Understanding Map Reduce Flow with Word Count Example

Map Reduce Programming

Introduction to Map Reduce API [New Map Reduce API]
Map Reduce Data Types
File Formats
Input Formats – Input Splits & Records, text input, binary input
Output Formats – Text Output, Binary Output
Configuring Development Environment – Eclipse
Developing a Map Reduce Application using Default Functionality
- Identity Mapper
- Identity Reducer
- ToolRunner API Introduction
Developing Word Count Application
- Writing Mapper, Reducer & Driver Code
- Building Application
- Deploying Application
Running the Map Reduce Application
- Local Mode of Execution
- Cluster Mode of Execution
Monitoring Map Reduce Application
Map Reduce Combiner
Map Reduce Counters
Map Reduce Partitioner
File Merge Utility

Programming with HIVE

Introduction to HIVE
Hive Architecture
Types of Meta store
Introduction to Hive Configuration Files
Hive Data Types
- Simple Data Types
- Collection Data Types

Types of Hive Tables
- Managed Table
- External Table
Hive Query Language (HQL or HIVE QL)
- Creating Databases
- Creating Tables
- Loading Data into table
- Joins in Hive
- Group BY and Distinct operations
- Partitioning
  - Static Partitioning
  - Dynamic Partitioning
- Bucketing
- Lateral View & Explode [Introduction to Hive UDFs à UDF, UDAF & UDTF]
- XML Processing in HIVE
- JSON processing in HIVE
- URL Processing in HIVE
Hive File Formats [Introduction to Hive SERDE]
- Parquet
- ORC
- AVRO
Storage Formats
Introduction to HIVE Query Optimizations
Developing Hive UDFs in JAVA
Hive Views

Programming with PIG

Introduction to PIG
PIG Architecture
Introduction to PIG Configuration Files
PIG vs. HIVE vs. Map Reduce
Introduction to Data Flow Language
Pig Data Types
Pig Programming Modes
Pig Access Modes
Detailed PIG Latin Programming
PIG UDFs & UDF Development in JAVA
Hive – PIG Integration à Introduction to HCATALOG
Introduction to PIG Optimization

NoSQL & HBASE

Introduction to NoSQL Databases
Types of NoSQL Databases
Introduction To HBASE
HBASE Architecture
HBASE Shell Interface
- Creating Data Bases and Tables
- Inserting Data in tables
- Accessing data from Tables
- HBase Filters
Hive & HBASE Integration
PIG & HBASE Integration
Document Store – MongoDB Overview

Introduction to Streaming & FLUME

Introduction to Streaming
Introduction to FLUME
FLUME Architecture
Flume Agent Setup
Types of Source, Channel & Sinks
Developing Sample Flume Applications

SQOOP

Introduction to SQOOP
Connecting to RDBMS Using SQOOP
SQOOP Import
- Import to HDFS
- Import to HIVE
- Import to HBASE
- Bulk Import
  - Full Table
  - Subset of a Tables
  - All tables in DB
- Incremental Import
  - Incremental Append
  - Incremental Last Modified
- SQOOP Export
  - Export from HDFS
  - Export from Hive

Zookeeper

Introduction to Zookeeper
Distributed Coordination
Zookeeper Data Model
Zookeeper Service
Zookeeper Commands

Apache Kafka

Introduction to Kafka
Kafka Internals
Kafka Cluster Architecture
Kafka Producer
Kafka Consumer
Kafka Broker
Introduction to Kafka API
Kafka Stream Processing
Integrating Kafka with various Hadoop Systems

Introduction to Scala Programming

Introduction to Functional Programming & Scala
Comparing Java and Scala
Setting Up Scala in UNIX
Setting Up SBT
Introduction to Scala REPL
Setting up Scala on Eclipse (Scala IDE)

Scala Programming Fundamentals

Scala Data Types
Variable Declarations
Variable Type Inference
Operators
Scala Control Structures
Scala Looping Structures
Scala Functions
Scala Collections
- Array
- List
- Map
- Tuples
- Set

Functional Programming in Scala

Introduction to Functional Programming
Difference between OOPs & Functional Programming
Higher Order Functions
Anonymous Functions
Closures and Currying
Functional Programming on Collections
- Iteration, Mapping, Filtering and Reduce

Maps, Sets, Group By, Flatten and Flat Map
File Access and File Processing
Scala Pattern Matching

Object Oriented Programming in Scala

Concept of Classes in Scala
Implementing Getters and Setters
Concept of Objects in Scala
Singleton Objects
Companion Objects
Case Classes
Primary Constructor
Auxiliary Constructor
Overriding Methods
Apply Method
Traits and Abstract Classes
Exception Handling in Scala

Introduction to Spark

What is Apache Spark
Spark Unified Stack
- Spark Core
- Saprk SQL
- Spark Streaming
- MLib
- GraphX
- Cluster Managers
Users of Spark
Spark vs. Mapreduce
Introduction to Spark Shell
Introduction to Spark Core API for Spark Application Development

Programming With Spark RDDs

Introduction to RDDs
Creating RDDs
RDD Operations
- Transformations
- Actions
- Lazy Evaluation
Passing Functions to Spark
Common Transformations and Actions on RDDs
Concept of Pair RDDs
Transformation and Actions on Paired RDDs
Data Partitioning in RDDs
Concept of Persistence/Caching in RDDs
Accumulators and Broadcast Variables
Loading and Saving Data Using RDDs

o File Formats:

Text Files
CSV and Tab Separated Files
JSON
Sequence Files
Parquet Files
Compression Technique – Snappy, Gzip

Programming with Spark Data Frames & Spark SQL

Introduction to Spark Data Frames
Dataframes vs. RDDs
Introduction to Spark SQL
Understanding HiveContext
Operations on Data Frames
Schema RDDs and Converting Schema RDDs to DataFrames (Custom Case Classes)
Temp Tables vs. Persistent Tables
Loading and Saving Data in DFs
- Apache Hive
- JSON
- Parquet
- ORC Files
User Defined Functions (UDFs)
- Spark SQL UDF
- Hive UDF

Spark Streaming

Introduction to Spark Streaming Architecture
Introduction to Discrete Streams (DStreams)
Streaming Operations
Integrate Spark Streaming with Kafka

PySpark Overview (Time Permitting)

Introduction to PySpark & PySpark Shell
Using Python to develop Spark Applications
Running PySpark Application

Description

Real time idea of Hadoop Development

Ø In-depth understanding of Entire Big Data Hadoop and Hadoop Ecosystem

Detailed Course Structure:

Introduction to Big Data & Hadoop

Understanding Hadoop Architecture

Introduction to HDFS (Hadoop Distributed File System)

Working with HDFS

JAVA Approach [Introducing HDFS JAVA API]

Introduction to Map Reduce Paradigm

Map Reduce Programming

Programming with HIVE

Programming with PIG

NoSQL & HBASE

Introduction to Streaming & FLUME

SQOOP

Zookeeper

Apache Kafka

Introduction to Scala Programming

Scala Programming Fundamentals

Functional Programming in Scala

Object Oriented Programming in Scala

Introduction to Spark

Programming With Spark RDDs

o File Formats:

Programming with Spark Data Frames & Spark SQL

Spark Streaming

PySpark Overview (Time Permitting)

Discount

WEB INQURIES

Wanna make an easy 500 INR?

Classroom Infrastructure

Description

Real time idea of Hadoop Development

Ø In-depth understanding of Entire Big Data Hadoop and Hadoop Ecosystem

Detailed Course Structure:

Introduction to Big Data & Hadoop

Understanding Hadoop Architecture

Introduction to HDFS (Hadoop Distributed File System)

Working with HDFS

JAVA Approach [Introducing HDFS JAVA API]

Introduction to Map Reduce Paradigm

Map Reduce Programming

Programming with HIVE

Programming with PIG

NoSQL & HBASE

Introduction to Streaming & FLUME

SQOOP

Zookeeper

Apache Kafka

Introduction to Scala Programming

Scala Programming Fundamentals

Functional Programming in Scala

Object Oriented Programming in Scala

Introduction to Spark

Programming With Spark RDDs

o File Formats:

Programming with Spark Data Frames & Spark SQL

Spark Streaming

PySpark Overview (Time Permitting)

Related products

PYTHON

DEVOPS

Selenium With Python

Azure Devops

Discount

WEB INQURIES

Wanna make an easy 500 INR?

Classroom Infrastructure