MapReduce Interview Question Part5

Q41 What is configuration of a typical slave node on Hadoop cluster? How many JVMs run on a slave node?
Answer:Single instance of a Task Tracker is run on each Slave node. Task tracker is run as a separate JVM process.Single instance of a DataNode daemon is run on each Slave node. DataNode daemon is run as a separate JVM process.One or Multiple instances of Task Instance is run on each slave node. Each task instance is run as a separate JVM process. The number of Task instances can be controlled by configuration. Typically a high end machine is configured to run more task instances.
Q42 What do you know about NLineOutputFormat?
Answer: NLineOutputFormat splits ‘n’ lines of input as one split.
Q43 True or false: Each reducer must generate the same number of key/value pairs as its input had.
Answer: False. Reducer may generate any number of key/value pairs including zero.
Q44 When is the reducers are started in a MapReduce job?
Answer: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.
Q45 Name Job control options specified by MapReduce.Since this framework supports chained operations wherein an input of one map job serves as the output for other, there is a need for job controls to govern these complex operations. The various job control options are:
  • submit(): to submit the job to the cluster and immediately return
  • waitforCompletion(boolean): to submit the job to the cluster and wait for its completion
Q46  Decide if the statement is true or false: Each combiner runs exactly once.
Answer: False. The framework decides whether combiner runs zero, once or multiple times.
Q47 Define a straggler.
Answer: Straggler is either map or reduce task that takes unusually long time to complete.



Q48 Explain what is distributed Cache in MapReduce Framework ?
Answer: Distributed Cache is an important feature provided by map reduce framework. When you want to share some files across all nodes in Hadoop Cluster, DistributedCache is used. The files could be an executable jar files or simple properties file.
Q49 How JobTracker schedules a task?
Answer: The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.
Q50 What is chain Mapper?
Answer: Chain Mapper class is a special implementation of Mapper class through which a set of mapper classes can be run in a chain fashion, within a single map task.In this chained pattern execution, first mapper output will become input for second mapper and second mappers output to third mapper, and so on until the last mapper.