python - PySpark reading Avro file with a map complexe type. Exception while getting task result: java.io.InvalidClassException -


i share problem when use code example avro_inputformat.py

schema = open('test_schema_without_map.avsc').read() conf = {"avro.schema.input.key": reduce(lambda x, y: x + y, schema)} avro_image_rdd = sc.newapihadoopfile(     input_file,     "org.apache.avro.mapreduce.avrokeyinputformat",     "org.apache.avro.mapred.avrokey",     "org.apache.hadoop.io.nullwritable",  keyconverter="org.apache.spark.examples.pythonconverters.avrowrappertojavaconverter",     conf=conf )  output = avro_image_rdd.map(lambda x: x[0]).collect() k in output:     print "image filename : %s" % k 

and when running

spark-submit --driver-class-path /opt/spark-1.6.1/lib/spark-examples-1.6.1-hadoop2.6.0.jar read_test_avro_file_with_map.py  

i following error

job aborted due stage failure: exception while getting task result: java.io.invalidclassexception: scala.collection.convert.wrappers$mutablemapwrapper; no valid constructor 

when reading avro file following schema:

{ "namespace": "test.avro", "type": "record", "name": "testimage", "fields": [     {"name": "filename", "type": "string"},     {"name": "data", "type": "bytes"},     {"name": "metadata", "type":         {             "type": "map", "values": "string"         }     }    ], } 

however, same code works without problem when schema not contain 'map' avro complex type :

{ "namespace": "test.avro", "type": "record", "name": "testimage", "fields": [     {"name": "filename", "type": "string"},     {"name": "data", "type": "bytes"},    ], } 

if knows problem, please share experience...


versions:

  • spark 1.6.1
  • avro 1.8.0

the content of avro file :

records = [ {     "filename": "input_filename_1",     "metadata": {"a": "1", "b": "23"},     "data": "1,2,3,4,5,6,7,8,9,0" }, {     "filename": "input_filename_2",     "metadata": {"c": "11", "d": "213"},     "data": "10,11,12,13,14,15" } ] 


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -