python - PySpark reading Avro file with a map complexe type. Exception while getting task result: java.io.InvalidClassException -
i share problem when use code example avro_inputformat.py
schema = open('test_schema_without_map.avsc').read() conf = {"avro.schema.input.key": reduce(lambda x, y: x + y, schema)} avro_image_rdd = sc.newapihadoopfile( input_file, "org.apache.avro.mapreduce.avrokeyinputformat", "org.apache.avro.mapred.avrokey", "org.apache.hadoop.io.nullwritable", keyconverter="org.apache.spark.examples.pythonconverters.avrowrappertojavaconverter", conf=conf ) output = avro_image_rdd.map(lambda x: x[0]).collect() k in output: print "image filename : %s" % k
and when running
spark-submit --driver-class-path /opt/spark-1.6.1/lib/spark-examples-1.6.1-hadoop2.6.0.jar read_test_avro_file_with_map.py
i following error
job aborted due stage failure: exception while getting task result: java.io.invalidclassexception: scala.collection.convert.wrappers$mutablemapwrapper; no valid constructor
when reading avro file following schema:
{ "namespace": "test.avro", "type": "record", "name": "testimage", "fields": [ {"name": "filename", "type": "string"}, {"name": "data", "type": "bytes"}, {"name": "metadata", "type": { "type": "map", "values": "string" } } ], }
however, same code works without problem when schema not contain 'map' avro complex type :
{ "namespace": "test.avro", "type": "record", "name": "testimage", "fields": [ {"name": "filename", "type": "string"}, {"name": "data", "type": "bytes"}, ], }
if knows problem, please share experience...
versions:
- spark 1.6.1
- avro 1.8.0
the content of avro file :
records = [ { "filename": "input_filename_1", "metadata": {"a": "1", "b": "23"}, "data": "1,2,3,4,5,6,7,8,9,0" }, { "filename": "input_filename_2", "metadata": {"c": "11", "d": "213"}, "data": "10,11,12,13,14,15" } ]
Comments
Post a Comment