According to this document, here are the unicode ranges of chinese characters:
Table 12-2. Blocks Containing Han Ideographs
Block Range Comment
CJK Unified Ideographs 4E00–9FFF Common
CJK Unified Ideographs Extension A 3400–4DBF Rare
CJK Unified Ideographs Extension B 20000–2A6DF Rare, historic
CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use
CJK Compatibility Ideographs F900–FAFF Duplicates, unifiable variants, corporate
characters
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants
You could use it like this:
preg_replace('/[^u4E00-u9FFF]+/u', '', $string);
or
preg_replace('/P{Han}+/u', '', $string);
where P
is the negation of p
see here for all the unicode scripts
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…