I noticed the same issue. I believe there is already a ticket to address it, but here is what AWS support suggests in the meantime.
If you are using referenced files path variable in a Python
shell job, referenced file is found in /tmp
, where Python shell
job has no access by default. However, the same operation works
successfully in Spark job, because the file is found in the default
file directory.
Code below helps find the absolute path of sample_config.json
that was referenced in Glue job configuration and prints its contents.
import json
import sys, os
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))
with open(get_referenced_filepath('sample_config.json'), "r") as f:
data = json.load(f)
print(data)
Boto3 API can be used to access the referenced file as well
import boto3
s3 = boto3.resource('s3')
obj = s3.Object('sample_bucket', 'sample_config.json')
for line in obj.get()['Body']._raw_stream:
print(line)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…