Thursday, September 15, 2016

Counters in Hadoop MapReduce

In this post I would like to explain the meaning of the Hadoop counters (the ones which you can generally see after the job completion). I have been analyzing the starvation of long running jobs in our relatively small cluster and Hadoop counters were of extreme importance in this investigation. Unfortunantely I could not find any resource which would explain in detail the meaning of those. In the table presented below, I am trying to describe in clear way what each of the counters means in Hadoop 2.6 release.
Counter NameCounter Display NameDetailed explanation
File System Counters
FILE_BYTES_READFILE: Number of bytes readAmount of data read from local filesystem.
FILE_BYTES_WRITTENFILE: Number of bytes writtenAmount of data written to local filesystem.
FILE_READ_OPSFILE: Number of read operationsNumber of read operations from local filesystem.
FILE_LARGE_READ_OPSFILE: Number of large read operationsNumber of read operations of large files from local filesystem (the ones which does not fit entirely into memory).
FILE_WRITE_OPSFILE: Number of write operationsNumber of write operations from local filesystem.
HDFS_BYTES_READHDFS: Number of bytes readAmount of data read from HDFS.
HDFS_BYTES_WRITTENHDFS: Number of bytes writtenAmount of data written to HDFS.
HDFS_READ_OPSHDFS: Number of read operationsNumber of read operations from HDFS.
HDFS_LARGE_READ_OPSHDFS: Number of large read operationsNumber of read operations of large files from HDFS (the ones which does not fit entirely into memory).
HDFS_WRITE_OPSHDFS: Number of write operationsNumber of write operations to HDFS.
Job Counters
TOTAL_LAUNCHED_MAPSLaunched map tasksTotal number of launched map tasks.
TOTAL_LAUNCHED_REDUCESLaunched reduce tasksTotal number of launched reduce tasks.
DATA_LOCAL_MAPSData-local map tasksNumber of map tasks which were launched on the nodes containing required data.
SLOTS_MILLIS_MAPSTotal time spent by all maps in occupied slots (ms)Total time map tasks were executing.
SLOTS_MILLIS_REDUCESTotal time spent by all reduces in occupied slots (ms)Total time reduce tasks were executing.
MILLIS_MAPSTotal time spent by all map tasks (ms)Wall-time resources were occupied by mappers.
MILLIS_REDUCESTotal time spent by all reduce tasks (ms)Wall-time resources were occupied by reducers.
VCORES_MILLIS_MAPSTotal vcore-seconds taken by all map tasksAggregated number of vCores that the mappers have allocated times the number of seconds the mappers have been running.
VCORES_MILLIS_REDUCESTotal vcore-seconds taken by all reduce tasksAggregated number of vCores that the reducers have allocated times the number of seconds the reducers have been running.
MB_MILLIS_MAPSTotal megabyte-seconds taken by all map tasksAggregated amount of memory (in megabytes) mappers have allocated times the number of seconds mappers have been running.
MB_MILLIS_REDUCESTotal megabyte-seconds taken by all reduce tasksAggregated amount of memory (in megabytes) reducers have allocated times the number of seconds reducers has have running.
Map-Reduce Framework
MAP_INPUT_RECORDSMap input recordsTotal number of records processed by mappers.
MAP_OUTPUT_RECORDSMap output recordsTotal number of records produced by mappers.
MAP_OUTPUT_BYTESMap output bytesTotal amount of data produced by mappers.
MAP_OUTPUT_MATERIALIZED_BYTESMap output materialized bytesThe amount of data which was actually written to disk (if the compression is enabled).
SPLIT_RAW_BYTESAmount of data consumed for metadata representation during splits.
COMBINE_INPUT_RECORDSCombine input recordsTotal number of records processed by combiners(if implemented).
COMBINE_OUTPUT_RECORDSCombine output recordsTotal number of records produced by combiners(if implemented).
REDUCE_INPUT_GROUPSReduce input groupsTotal number of unique keys.
REDUCE_SHUFFLE_BYTESReduce shuffle bytes
REDUCE_INPUT_RECORDSReduce input recordsTotal number of records processed by reducers.
REDUCE_OUTPUT_RECORDSReduce output recordsTotal number of records produced by reducers.
SPILLED_RECORDSSpilled RecordsTotal number of records (by mappers and reducers) which were spilled to disk (in case when there is not enough memory).
SHUFFLED_MAPSShuffled MapsTotal number of mappers which undergone through shuffle phase.
FAILED_SHUFFLEFailed ShufflesTotal number of mappers which failed to undergo through shuffle phase.
MERGED_MAP_OUTPUTSMerged Map outputsTotal number of mapper output files undergone through shuffle phase.
GC_TIME_MILLISGC time elapsed (ms)Wall-time spent for Garbage Collection.
CPU_MILLISECONDSCPU time spent (ms)Cumulative CPU time for all tasks.
PHYSICAL_MEMORY_BYTESPhysical memory (bytes) snapshotTotal physical memory used by all tasks including spilled data.
VIRTUAL_MEMORY_BYTESVirtual memory (bytes) snapshotTotal virtual memory used by all tasks.
COMMITTED_HEAP_BYTESTotal committed heap usage (bytes)Total amount of memory available for JVM.
Shuffle Errors
BAD_IDBAD_IDTotal number of errors related with the intepretations of IDs from shuffle headers (mapper ID for example).
CONNECTIONCONNECTIONSource code does not reveal any usage for this counter.
IO_ERRORIO_ERRORTotal number of errors related with reading and writing intermediate data.
WRONG_LENGTHWRONG_LENGTHTotal number of errors relared with missbehaving compression and decompression of intermediate data.
WRONG_MAPWRONG_MAPTotal number of errors related to duplication of the mapper output data (when framework tries to process already processed mapper output).
WRONG_REDUCEWRONG_REDUCETotal number of errors related to the attempts of shuffling data for wrong reducer (when shuffle for determined reducer tries to shuffle the data for different reducer).
File Input Format Counters
BYTES_READBytes ReadAmount of data read by every tasks for every filesystem.
File Output Format Counters
BYTES_WRITTENBytes WrittenAmount of data written by every tasks for 

No comments:

Post a Comment