configuration - Hadoop conf to determine num map tasks -
i have job, hadoop jobs, seems have total of 2 map tasks when running can see in hadoop interface. however, means loading data java heap space error.
i've tried setting many different conf properties in hadoop cluster make job split more tasks nothing seems have effect.
i have tried setting mapreduce.input.fileinputformat.split.maxsize
, mapred.max.split.size
, dfs.block.size
none seem have effect.
i'm using 0.20.2-cdh3u6, , trying run job using cascading.jdbc - job failing on reading data database. think issue can resolved increasing number of splits can't work out how that!
please help! going crazy!
2013-07-23 09:12:15,747 fatal org.apache.hadoop.mapred.child: error running child : java.lang.outofmemoryerror: java heap space @ com.mysql.jdbc.buffer.<init>(buffer.java:59) @ com.mysql.jdbc.mysqlio.nextrow(mysqlio.java:1477) @ com.mysql.jdbc.mysqlio.readsinglerowset(mysqlio.java:2936) @ com.mysql.jdbc.mysqlio.getresultset(mysqlio.java:477) @ com.mysql.jdbc.mysqlio.readresultsforqueryorupdate(mysqlio.java:2631) @ com.mysql.jdbc.mysqlio.readallresults(mysqlio.java:1800) @ com.mysql.jdbc.mysqlio.sqlquerydirect(mysqlio.java:2221) @ com.mysql.jdbc.connectionimpl.execsql(connectionimpl.java:2618) @ com.mysql.jdbc.connectionimpl.execsql(connectionimpl.java:2568) @ com.mysql.jdbc.statementimpl.executequery(statementimpl.java:1557) @ cascading.jdbc.db.dbinputformat$dbrecordreader.<init>(dbinputformat.java:97) @ cascading.jdbc.db.dbinputformat.getrecordreader(dbinputformat.java:376) @ cascading.tap.hadoop.multiinputformat$1.operate(multiinputformat.java:282) @ cascading.tap.hadoop.multiinputformat$1.operate(multiinputformat.java:277) @ cascading.util.util.retry(util.java:624) @ cascading.tap.hadoop.multiinputformat.getrecordreader(multiinputformat.java:276) @ org.apache.hadoop.mapred.maptask.runoldmapper(maptask.java:370) @ org.apache.hadoop.mapred.maptask.run(maptask.java:324) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1278) @ org.apache.hadoop.mapred.child.main(child.java:260)
you should @ settings of memory management io.sort.mb
or mapred.cluster.map.memory.mb
because heap space errors due allocation problem , not map number.
if want force map number have consider values used prior others. instance mapreduce.input.fileinputformat.split.maxsize
if small generate huge amount of taks if set mapred.tasktracker.map.tasks.maximum
small value.
the dfs.block.size
has impact on generated map number if greater mapreduce.input.fileinputformat.split.maxsize
Comments
Post a Comment