regex - Detect strings with non English characters in Python

Question

Welcome To Ask or Share your Answers For Others

regex - Detect strings with non English characters in Python

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Detect strings with non English characters in Python

I have some strings that have a mix of English and none English letters. For example:

w='_1991_??_??2'

How can I recognize these types of string using Regex or any other fast method in Python?

I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:13:34+0000

You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.

Note the comment # -*- coding: ..... It should be there at the top of the python file (otherwise you would receive some error about encoding)

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

assert not isEnglish('slabiky, ale li?í se podle vyznamu')
assert isEnglish('English')
assert not isEnglish('?? ???????? ?? ?????? ??')
assert not isEnglish('how about this one : 通 asf?')
assert isEnglish('?fd4))45s&')

Categories

regex - Detect strings with non English characters in Python

regex - Detect strings with non English characters in Python

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags