The data posted isn't valid JSON (at least missing a set of outer curly braces) and was encoded incorrectly. UTF-8 bytes were written as Unicode code points. Ideally correct the original code, but the following will fix the mess you have now, if "input.json" is the original data with the outer curly braces added:
import json
# Read the raw bytes of the data file
with open('input.json','rb') as f:
raw = f.read()
# There are some newline escapes that shouldn't be converted,
# so double-escape them so the result leaves them escaped.
raw = raw.replace(rb'
',rb'\n')
# Convert all the escape codes to Unicode characters
raw = raw.decode('unicode_escape')
# The characters are really UTF-8 byte values.
# The "latin1" codec translates Unicode code points 1:1 to byte values,
# resulting in a byte string again.
raw = raw.encode('latin1')
# Decode correctly as UTF-8
raw = raw.decode('utf8')
# Now that the JSON is fixed, load it into a Python object
data = json.loads(raw)
# Re-write the JSON correctly.
with open('output.json','w',encoding='utf8') as f:
json.dump(data,f,ensure_ascii=False,indent=2)
Result:
{
"messages": [
{
"sender_name": "#20KAREL’s ????",
"timestamp_ms": 1610288228221,
"content": "我隔離",
"type": "Generic",
"is_unsent": false
},
{
"sender_name": "#20KAREL’s ????",
"timestamp_ms": 1610288227699,
"share": {
"link": "https://www.instagram.com/p/B6UlYZvA4Pd/",
"share_text": "//
Memorabilia???????????????????????
??????????????
#191214
#191221",
"original_content_owner": "_ki.zeng"
},
"type": "Share",
"is_unsent": false
},
{
"sender_name": "#20KAREL’s ????",
"timestamp_ms": 1607742844729,
"content": "扮瞓就好",
"type": "Generic",
"is_unsent": false
}
]
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…