• Phone+91 8688 800 900
  • Emailinfo@gvipl.in
  • AddressPlot no. 4, Nagarjuna Ikon, 4th Floor Croma Stores, Kondapur Junction, Hyderabad, Telangana
  • Open Hours9 AM to 6 PM
  • Phone+91 8688 800 900
  • Emailinfo@gvipl.in
  • AddressPlot no. 4, Nagarjuna Ikon, 4th Floor Croma Stores, Kondapur Junction, Hyderabad, Telangana
  • Open Hours9 AM to 6 PM

BIG DATA

Category:

Description

Real time idea of Hadoop Development

Ø  In-depth understanding of Entire Big Data Hadoop and Hadoop Ecosystem
  • Detailed Course Materials
  • Free Core Java and UNIX Fundamentals
  • Interview Oriented Discussions
  • Get Ready for Hadoop & Spark Developer (CCA175) Certification Exam

UNIX/LINUX Basic Commands    Basic UNIX Shell Scripting

Basic Java Programming – Core JAVA OOPS Concepts

Introduction to Big Data and Hadoop

Working With HDFS

Hadoop Map Reduce Concepts & Features    Developing Map Reduce Applications

Hadoop Eco System Components:

  • HIVE
  • PIG
  • HBASE
  • FLUME
  • SQOOP
  • Zookeeper

Detailed SPARK with SCALA Programming    Detailed Kafka Streaming

Overview of MongoDB

Overview of Spark with Python Programming (Time Permitting)

Real Time Tools like Putty, WinSCP, Eclipse, Hue, Cloudera Manager

  • Basic SQL Knowledge
  • Computer with Minimum 4GB RAM (8GM RAM Preferred)
  • Basic UNIX & Java Programming knowledge is added advantage
Detailed Course Structure:
 Introduction to Big Data & Hadoop
  • The Big Data Problem
  • What is Big Data?
  • Challenges in processing Big Data
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Hadoop Components Overview
    • HDFS
    • Map Reduce
  • Hadoop Eco System Introduction
  • NoSQL Database Introduction
 Understanding Hadoop Architecture
  • Hadoop 2.x Architecture
  • Introduction to YARN
  • Hadoop Daemons
  • YARN Architecture
    • Resource Manager
    • Application Master
    • Node Manager
 Introduction to HDFS (Hadoop Distributed File System)
  • Rack Awareness
  • HDFS Daemons
  • Writing Files to HDFS
    • Blocks & Splits
    • Input Splits
    • Data Replication
  • Reading Files from HDFS
  • Introduction to HDFS Configuration Files
 Working with HDFS
  • HDFS Commands
  • Accessing HDFS
    • CLI Approach
    • JAVA Approach [Introducing HDFS JAVA API]
 Introduction to Map Reduce Paradigm
  • What is Map Reduce?
  • Detailed Map Reduce Flow
    • Introduction to Key/Value Approach
    • Detailed Mapper Functionality
    • Detailed Reducer Functionality
    • Details of Partitioner
    • Shuffle & Sort Process
  • Understanding Map Reduce Flow with Word Count Example
 Map Reduce Programming
  • Introduction to Map Reduce API [New Map Reduce API]
  • Map Reduce Data Types
  • File Formats
  • Input Formats – Input Splits & Records, text input, binary input
  • Output Formats – Text Output, Binary Output
  • Configuring Development Environment – Eclipse
  • Developing a Map Reduce Application using Default Functionality
    • Identity Mapper
    • Identity Reducer
    • ToolRunner API Introduction
  • Developing Word Count Application
    • Writing Mapper, Reducer & Driver Code
    • Building Application
    • Deploying Application
  • Running the Map Reduce Application
    • Local Mode of Execution
    • Cluster Mode of Execution
  • Monitoring Map Reduce Application
  • Map Reduce Combiner
  • Map Reduce Counters
  • Map Reduce Partitioner
  • File Merge Utility
 Programming with HIVE
  • Introduction to HIVE
  • Hive Architecture
  • Types of Meta store
  • Introduction to Hive Configuration Files
  • Hive Data Types
    • Simple Data Types
    • Collection Data Types
  • Types of Hive Tables
    • Managed Table
    • External Table
  • Hive Query Language (HQL or HIVE QL)
    • Creating Databases
    • Creating Tables
    • Loading Data into table
    • Joins in Hive
    • Group BY and Distinct operations
    • Partitioning
      • Static Partitioning
      • Dynamic Partitioning
    • Bucketing
    • Lateral View & Explode [Introduction to Hive UDFs à UDF, UDAF & UDTF]
    • XML Processing in HIVE
    • JSON processing in HIVE
    • URL Processing in HIVE
  • Hive File Formats [Introduction to Hive SERDE]
    • Parquet
    • ORC
    • AVRO
  • Storage Formats
  • Introduction to HIVE Query Optimizations
  • Developing Hive UDFs in JAVA
  • Hive Views
 Programming with PIG
  • Introduction to PIG
  • PIG Architecture
  • Introduction to PIG Configuration Files
  • PIG vs. HIVE vs. Map Reduce
  • Introduction to Data Flow Language
  • Pig Data Types
  • Pig Programming Modes
  • Pig Access Modes
  • Detailed PIG Latin Programming
  • PIG UDFs & UDF Development in JAVA
  • Hive – PIG Integration à Introduction to HCATALOG
  • Introduction to PIG Optimization
 NoSQL & HBASE
  • Introduction to NoSQL Databases
  • Types of NoSQL Databases
  • Introduction To HBASE
  • HBASE Architecture
  • HBASE Shell Interface
    • Creating Data Bases and Tables
    • Inserting Data in tables
    • Accessing data from Tables
    • HBase Filters
  • Hive & HBASE Integration
  • PIG & HBASE Integration
  • Document Store – MongoDB Overview
 Introduction to Streaming & FLUME
  • Introduction to Streaming
  • Introduction to FLUME
  • FLUME Architecture
  • Flume Agent Setup
  • Types of Source, Channel & Sinks
  • Developing Sample Flume Applications
 SQOOP
  • Introduction to SQOOP
  • Connecting to RDBMS Using SQOOP
  • SQOOP Import
    • Import to HDFS
    • Import to HIVE
    • Import to HBASE
    • Bulk Import
      • Full Table
      • Subset of a Tables
      • All tables in DB
    • Incremental Import
      • Incremental Append
      • Incremental Last Modified
    • SQOOP Export
      • Export from HDFS
      • Export from Hive
 Zookeeper
  • Introduction to Zookeeper
  • Distributed Coordination
  • Zookeeper Data Model
  • Zookeeper Service
  • Zookeeper Commands
 Apache Kafka
  • Introduction to Kafka
  • Kafka Internals
  • Kafka Cluster Architecture
  • Kafka Producer
  • Kafka Consumer
  • Kafka Broker
  • Introduction to Kafka API
  • Kafka Stream Processing
  • Integrating Kafka with various Hadoop Systems
 Introduction to Scala Programming
  • Introduction to Functional Programming & Scala
  • Comparing Java and Scala
  • Setting Up Scala in UNIX
  • Setting Up SBT
  • Introduction to Scala REPL
  • Setting up Scala on Eclipse (Scala IDE)
 Scala Programming Fundamentals
  • Scala Data Types
  • Variable Declarations
  • Variable Type Inference
  • Operators
  • Scala Control Structures
  • Scala Looping Structures
  • Scala Functions
  • Scala Collections
    • Array
    • List
    • Map
    • Tuples
    • Set
 Functional Programming in Scala
  • Introduction to Functional Programming
  • Difference between OOPs & Functional Programming
  • Higher Order Functions
  • Anonymous Functions
  • Closures and Currying
  • Functional Programming on Collections
    • Iteration, Mapping, Filtering and Reduce
  • Maps, Sets, Group By, Flatten and Flat Map
  • File Access and File Processing
  • Scala Pattern Matching
 Object Oriented Programming in Scala
  • Concept of Classes in Scala
  • Implementing Getters and Setters
  • Concept of Objects in Scala
  • Singleton Objects
  • Companion Objects
  • Case Classes
  • Primary Constructor
  • Auxiliary Constructor
  • Overriding Methods
  • Apply Method
  • Traits and Abstract Classes
  • Exception Handling in Scala
 Introduction to Spark
  • What is Apache Spark
  • Spark Unified Stack
    • Spark Core
    • Saprk SQL
    • Spark Streaming
    • MLib
    • GraphX
    • Cluster Managers
  • Users of Spark
  • Spark vs. Mapreduce
  • Introduction to Spark Shell
  • Introduction to Spark Core API for Spark Application Development
 Programming With Spark RDDs
  • Introduction to RDDs
  • Creating RDDs
  • RDD Operations
    • Transformations
    • Actions
    • Lazy Evaluation
  • Passing Functions to Spark
  • Common Transformations and Actions on RDDs
  • Concept of Pair RDDs
  • Transformation and Actions on Paired RDDs
  • Data Partitioning in RDDs
  • Concept of Persistence/Caching in RDDs
  • Accumulators and Broadcast Variables
  • Loading and Saving Data Using RDDs
o   File Formats:
  • Text Files
  • CSV and Tab Separated Files
  • JSON
  • Sequence Files
  • Parquet Files
  • Compression Technique – Snappy, Gzip
 Programming with Spark Data Frames & Spark SQL
  • Introduction to Spark Data Frames
  • Dataframes vs. RDDs
  • Introduction to Spark SQL
  • Understanding HiveContext
  • Operations on Data Frames
  • Schema RDDs and Converting Schema RDDs to DataFrames (Custom Case Classes)
  • Temp Tables vs. Persistent Tables
  • Loading and Saving Data in DFs
    • Apache Hive
    • JSON
    • Parquet
    • ORC Files
  • User Defined Functions (UDFs)
    • Spark SQL UDF
    • Hive UDF
 Spark Streaming
  • Introduction to Spark Streaming Architecture
  • Introduction to Discrete Streams (DStreams)
  • Streaming Operations
  • Integrate Spark Streaming with Kafka
 PySpark Overview (Time Permitting)                                                                                        
  • Introduction to PySpark & PySpark Shell
  • Using Python to develop Spark Applications
  • Running PySpark Application