Monday, September 12, 2016

Difference between Hadoop OLD API and NEW API

Below are Difference  between Hadoop OLD API (0.20) and New API (1.X or 2.X)




DiffrenceNew APIOLD API
Mapper & ReducerNew API useing Mapper and Reducer asClass 
So can add a method (with a default implementation) to an
abstract class without breaking old implementations of the class
IN OLD API used Mapper & Reduceer asInterface (still exist in New API as well)
Packagenew API is in theorg.apache.hadoop.mapreduce packageold API can still be found inorg.apache.hadoop.mapred.
User Code to commnicate with MapReduce Syatermuse “context” object to communicate with mapReduce systemJobConf, the OutputCollector, and theReporter object use for communicate with Map reduce System
Control Mapper and Reducer executionnew API allows both mappers and reducers to control the execution
flow by overriding the run() method.
Controlling mappers by writing aMapRunnable, but no
equivalent exists for reducers.
JOB controlJob control is done through the JOB classin New APIJob Control was done through JobClient
(not exists in the new API)
Job ConfigurationJob Configuration done throughConfiguration class via some of
the helper methods on Job.
jobconf objet was use for Job configuration.which is extension of Configuration class.

java.lang.Object
extended by org.apache.hadoop.conf.Configuration
extended by org.apache.hadoop.mapred.JobConf
OutPut file NameIn the new API map outputs are namedpart-m-nnnnn, and reduce outputs are named part-r-nnnnn (where nnnnn is an integer
designating the part number, starting from zero).
in the old API both map and reduce
outputs are named part-nnnnn
reduce() method passes valuesIn the new API, the reduce() method passes values as a java.lang.IterableIn the Old API, the reduce() method passes values as a java.lang.Iterator


No comments:

Post a Comment