Hadoop MapReduce: How to ensure multiple tasks are executed in parallel among all nodes -

- July 15, 2010

i have task list file in hdfs , list of tasks cpu-bound , executed in small 5-node cluster hadoop mapreduce (map only). instance, task list file contains 10 lines, each of corresponds task command. execution of each task takes way long time, more efficient execute listed 10 tasks in parallel on 5 nodes.

however, task list file pretty small, data block located on 1 node node execute these 10 tasks based on data locality principle. there solution ensure 10 tasks executed in parallel on 5 nodes?

by default, map reduce run 1 mapper per split. split block, if have large file, 1 mapper per block size of file (default 128mb) process 128mb chunk in parallel other chunks.

in case, have series of lines in small file - 1 split, , therefore processed single mapper.

however, instead of having 1 file of 10 lines, can create 10 files of 1 line? have 10 splits, , map reduce run 10 mappers across cluster in parallel (depending on available resources) process tasks.

Search This Blog

Shell

Hadoop MapReduce: How to ensure multiple tasks are executed in parallel among all nodes -

Comments

Post a Comment

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -