configuration - Hadoop conf to determine num map tasks -

- January 15, 2012

i have job, hadoop jobs, seems have total of 2 map tasks when running can see in hadoop interface. however, means loading data java heap space error.

i've tried setting many different conf properties in hadoop cluster make job split more tasks nothing seems have effect.

i have tried setting mapreduce.input.fileinputformat.split.maxsize, mapred.max.split.size, dfs.block.size none seem have effect.

i'm using 0.20.2-cdh3u6, , trying run job using cascading.jdbc - job failing on reading data database. think issue can resolved increasing number of splits can't work out how that!

please help! going crazy!

2013-07-23 09:12:15,747 fatal org.apache.hadoop.mapred.child: error running child : java.lang.outofmemoryerror: java heap space         @ com.mysql.jdbc.buffer.<init>(buffer.java:59)         @ com.mysql.jdbc.mysqlio.nextrow(mysqlio.java:1477)         @ com.mysql.jdbc.mysqlio.readsinglerowset(mysqlio.java:2936)         @ com.mysql.jdbc.mysqlio.getresultset(mysqlio.java:477)         @ com.mysql.jdbc.mysqlio.readresultsforqueryorupdate(mysqlio.java:2631)         @ com.mysql.jdbc.mysqlio.readallresults(mysqlio.java:1800)         @ com.mysql.jdbc.mysqlio.sqlquerydirect(mysqlio.java:2221)         @ com.mysql.jdbc.connectionimpl.execsql(connectionimpl.java:2618)         @ com.mysql.jdbc.connectionimpl.execsql(connectionimpl.java:2568)         @ com.mysql.jdbc.statementimpl.executequery(statementimpl.java:1557)         @ cascading.jdbc.db.dbinputformat$dbrecordreader.<init>(dbinputformat.java:97)         @ cascading.jdbc.db.dbinputformat.getrecordreader(dbinputformat.java:376)         @ cascading.tap.hadoop.multiinputformat$1.operate(multiinputformat.java:282)         @ cascading.tap.hadoop.multiinputformat$1.operate(multiinputformat.java:277)         @ cascading.util.util.retry(util.java:624)         @ cascading.tap.hadoop.multiinputformat.getrecordreader(multiinputformat.java:276)         @ org.apache.hadoop.mapred.maptask.runoldmapper(maptask.java:370)         @ org.apache.hadoop.mapred.maptask.run(maptask.java:324)         @ org.apache.hadoop.mapred.child$4.run(child.java:266)         @ java.security.accesscontroller.doprivileged(native method)         @ javax.security.auth.subject.doas(subject.java:415)         @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1278)         @ org.apache.hadoop.mapred.child.main(child.java:260)

you should @ settings of memory management io.sort.mb or mapred.cluster.map.memory.mb because heap space errors due allocation problem , not map number.

if want force map number have consider values used prior others. instance mapreduce.input.fileinputformat.split.maxsize if small generate huge amount of taks if set mapred.tasktracker.map.tasks.maximum small value.

the dfs.block.size has impact on generated map number if greater mapreduce.input.fileinputformat.split.maxsize

Search This Blog

Shell

configuration - Hadoop conf to determine num map tasks -

Comments

Post a Comment

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -