Hadoop Developer Self Learning Outline

Hadoop Developer Self Learning Outline

Learning Hadoop is not tough but it require patience.
I want to learn hadoop but from where should I start?
Are  you in search of such outline so here we have draft for hadoop learning outline


hadoop quiz presents a learning approach for beginner.
Prepare according to below outline and no one will stop you to become a HADOOPER



 Understanding Big Data

  •  3V (Volume-Variety-Velocity) characteristics
  • Structured and Unstructured Data
  • Application and use cases of Big Data
  • Limitations of traditional large Scale systems


Hadoop Introduction
  • Hadoop history and concepts
  • Ecosystem
  • Distributions
  • High level architecture
These topics are covered in two part, Kindly refer below section.

Hadoop Introduction | Hadoop Developer Self Learning
Hadoop Introduction | Hadoop Developer Self Learning
HDFS

  • Concepts (Distributed storage,horizontal scaling, replication, rack awareness)
  • Architecture
  • Namenode (function, storage, file system meta-data, and block reports)
  • Secondary namenode
  • Data node
  • Configuration files
  • Single node and multi node installation
  • Communications / heart-beats
  • Block manager / balancer
  • Health check / safemode
  • read / write path
  • Navigating HDFS UI
  • Command-line interaction with HDFS
  • File systems abstractions
  • Reading / writing files using Java API
  • Latest in HDFS
  • Namenode HA and Federation


MapReduce

  • MapReduce concepts
  • Daemons: jobtracker / tasktracker
  • Phases: driver, mapper, shuffle/sort, and reducer
  • First MapReduce job
  • MapReduce Programs ( Word Count,Word Co-Occurence,Average Word Lenth,Inverted Index programs)
  • MapReduce UI walk through
  • Counters
  • Distributed cache
  • Combiners
  • Partitioners
  • MapReduce configuration
  • Job config
  • MR types and formats
  • Sorting
  • Optimizing MapReduce
  • YARN Introduction


Hive

  • Hive introduction
  • Environment and configuration
  • Hive tables and metadata
  • HiveQL(DDL & DML Operations)
  • External vs Managed Tables
  • Partitions & Buckets
  • User Defined Functions
  • Json & Regex Serde


Pig

  • Pig Basics, Loading data files
  • Pig versus MapReduce
  • Data Types
  • Pig Latin language Constructs (LOAD, STORE, DUMP, SPLIT etc)
  • User Defined Functions


Sqoop

  • Sqoop Basics
  • Importing and Exporting data from using RDBMS
  • Hands On Exercises – Import and Export


Flume

  • Introduction to Flume
  • Flume source,channel,sink and agents
  • Flume Examples


Oozie

  • Introduction to Oozie
  • Oozie Workflow
  • Deploy and Run sample Oozie Workflow


NoSQL

  • Introduction to NoSQL
  • Different types of NoSQL databases (Key Value, Columnar,  Document,  Graph)
  • Mongo DB and Neo4J Introduction


HBase

  • Introduction to HBase
  • Architecture
  • Configuration
  • HBase versus RDBMS
  • HBase shell
  • HBase Java API
  • Splits and compaction
  • Read path / write path
  • Schema design


Hadoop PoC

  • Web Log Analysis (Small POC)
  • Twitter Analysis (Small POC)
  • Hadoop Usecases
get your hand dirty in hadoop framework
A hadoop Blog: https://hadoopquiz.blogspot.in/
FB Page:  https://www.facebook.com/hadoopquiz