OK, then there's something pretty simple you can do using a simple shell script (see below). The idea is to not have to edit your file manually, but let Python do it and create another file whose format complies with what the _bulk
endpoint expects. It does the following:
- First, we declare a little Python script that reads your JSON file and creates a new one with the required file format to be sent to the
_bulk
endpoint.
- Then, we run that Python script and store the bulk file
- Finally, we send the file created in step 2 to the
_bulk
endpoint using a simple curl command
- There you go, you now have a new ES index containing your documents
bulk.sh:
#!/bin/sh
# 0. Some constants to re-define to match your environment
ES_HOST=localhost:9200
JSON_FILE_IN=/path/to/your/file.json
JSON_FILE_OUT=/path/to/your/bulk.json
# 1. Python code to transform your JSON file
PYTHON="import json,sys;
out = open('$JSON_FILE_OUT', 'w');
with open('$JSON_FILE_IN') as json_in:
docs = json.loads(json_in.read());
for doc in docs:
out.write('%s
' % json.dumps({'index': {}}));
out.write('%s
' % json.dumps(doc, indent=0).replace('
', ''));
"
# 2. run the Python script from step 1
python -c "$PYTHON"
# 3. use the output file from step 2 in the curl command
curl -s -XPOST $ES_HOST/index/type/_bulk --data-binary @$JSON_FILE_OUT
You need to:
- save the above script in the
bulk.sh
file and chmod it (i.e. chmod u+x bulk.sh
)
- modify the three variable at the top (step 0) in ordre to match your environment
- run it using
./bulk.sh
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…