I am reading json by spark there is nothing special just:
spark.read.option('compression', 'gzip').option('dropFieldIfAllNull', True) .json(source_final)
But it fails with : Found duplicate column(s) in the data schema. There are no joins just 2 JSON lines in 500 Mb file have duplicated fields and entire spark job failed. Any workarounds?
: Found duplicate column(s) in the data schema
2.1m questions
2.1m answers
60 comments
57.0k users