MapReduce Interview Question Part4

6:13:00 PM 6:13:06 PM

Q31 Can we rename the output file?

Answer: Yes we can rename the output file by implementing multiple format output class

Q32What is Streaming?

Answer: Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language.

Q33 Explain what is Speculative Execution?

Answer: In Hadoop during Speculative Execution a certain number of duplicate tasks are launched. On different slave node, multiple copies of same map or reduce task can be executed using Speculative Execution. In simple words, if a particular drive is taking long time to complete a task, Hadoop will create a duplicate task on another disk. Disk that finish the task first are retained and disks that do not finish first are killed.

Q34 Is it possible to start reducers while some mappers still run? Why?

Answer: No. Reducer’s input is grouped by the key. The last mapper could theoretically produce key already consumed by running reducer.

Q35 Describe reduce side join between tables with one-on-one relationship?

Answer: Mapper produces key/value pairs with join ids as keys and row values as value. Corresponding rows from both tables are grouped together by the framework during shuffle and sort phase.Reduce method in reducer obtains join id and two values, each represents row from one table. Reducer joins the data.

Q36 Can you run Map – Reduce jobs directly on Avro data?

Answer: Yes, Avro was specifically designed for data processing via Map-Reduce.

Q37 Can reducers communicate with each other?

Answer: Reducers always run in isolation and they can never communicate with each other as per the Hadoop MapReduce programming paradigm.

Q38 How can you set an arbitrary number of Reducers to be created for a job in Hadoop?

Answer:You can do it programmatically by using method setNumReduceTasks in the Jobconf Class or set it up as a configuration setting.

Q39 What is TaskTracker?

Answer:TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations – from a JobTracker.Each Task Tracker is responsible to execute and manage the individual tasks assigned by Job Tracker. Task Tracker also handles the data motion between the map and reduce phases.One Prime responsibility of Task Tracker is to constantly communicate with the Job Tracker the status of the Task.If the JobTracker fails to receive a heartbeat from a TaskTracker within a specified amount of time, it will assume the TaskTracker has crashed and will resubmit the corresponding tasks to other nodes in the cluster.

Q40 How to set mappers and reducers for Hadoop jobs?

Answer: Users can configure JobConf variable to set number of mappers and reducers.job.setNumMaptasks() and job.setNumreduceTasks().

MapReduce Interview Question Part4

1 comment