You'll have to manually replace each non-BMP point with the surrogate pair. You could do this with a regular expression:
import re
_nonbmp = re.compile(r'[U00010000-U0010FFFF]')
def _surrogatepair(match):
char = match.group()
assert ord(char) > 0xffff
encoded = char.encode('utf-16-le')
return (
chr(int.from_bytes(encoded[:2], 'little')) +
chr(int.from_bytes(encoded[2:], 'little')))
def with_surrogates(text):
return _nonbmp.sub(_surrogatepair, text)
Demo:
>>> with_surrogates('U0001f64f')
'ud83dude4f'
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…