Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
398 views
in Technique[技术] by (71.8m points)

javascript - 有没有办法检查JS中的字符串是否是一个表情符号?(Is there a way to check if a string in JS is one single emoji?)

The question is simple: I have a string str , how do I check if str is one single emoji, and nothing else?

(问题很简单:我有一个字符串str ,如何检查str是否是一个表情符号,仅此而已?)

Additionally I would prefer not using another library.

(另外,我宁愿不使用其他库。)

Match "??" , "????♂?" , "3??" but not "??a" , "??" , "????"

(匹配"??""????♂?""3??"但不"??a""??""????")

I'm having trouble finding a solution but here are some things I've tried so far:

(我在寻找解决方案时遇到了麻烦,但是到目前为止,我已经尝试了一些方法:)


Attempted Solution 1 - Play around lengths and ... operator(尝试的解决方案1-播放长度和...运算符)

I learned that emojis occupy more than one byte, some even occupy 4 bytes, or more... and we can measure that via the string's length property:

(我了解到,表情符号占据一个以上的字节,有些甚至占据4个字节,甚至更多……我们可以通过字符串的length属性来测量:)

console.log("??".length); // 2
console.log("???".length); // 3
console.log("????♂?".length); // 6

Then I found out that the ... operator takes this into account and correctly separates emojis in the array - I could then see the resulting array's length property and detect if they were different.

(然后,我发现...运算符考虑了这一点,并正确分隔了数组中的表情符号-然后,我可以查看结果数组的length属性并检测它们是否不同。)

str = "????♂?";
if (str.length !== [...str].length) {
  // is emoji?
} else {
  // is not emoji
}

But this doesn't check for other multi-byte characters such as ?? whose length is 2. Plus some emojis were still getting separated in a weird.

(但是,这不会检查其他多字节字符,例如长度为2的?? 。此外,有些表情符号仍然被怪异地分开。)


Attempted Solution 2 - Regex, regular expressions(尝试解决方案2-正则表达式,正则表达式)

Of course regex would be a thing to look into but I've yet to find a viable solution.

(当然,正则表达式是一个值得研究的问题,但是我还没有找到可行的解决方案。)

This answer 's regex ?|?|[?-?]|?[?-?]|?[?-?]|?[?-?] works perfectly fine to detect if a string has any emojis, but applied to my situation it produces a lot of problems.

(此答案的正则表达式?|?|[?-?]|?[?-?]|?[?-?]|?[?-?]可以很好地检测字符串是否包含表情符号,但是将其应用于我的情况会产生很多问题。)

Here are my tests:

(这是我的测试:)

Part A - Without start/end of string regex ( ^ and $ )

(A部分-不包含字符串正则表达式的开头/结尾( ^$)

  • 2A.1 str.match(regex) is very inconsistent, it breaks down some emojis and some other unusable.

    (2A.1 str.match(regex)非常不一致,它分解了一些表情符号和一些其他无法使用的表情符号。)

    I don't see a way to find out if it even contains non-emoji characters or if it contains more than one emoji:

    (我没有找到一种方法来找出它甚至包含非表情符号字符还是包含多个表情符号:)

let regex = /(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])/;

console.log("5??".match(regex)); // [ '?', '?', index: 2, input: '5??' ]
console.log("??".match(regex)); // [ '??', '??', index: 0, input: '??' ]
console.log("??????".match(regex)); // [ '??', '??', index: 0, input: '??????' ]
console.log("a?".match(regex)); // [ '?', '?', index: 1, input: 'a?' ]
  • 2A.2 regex.test(str) returns true whenever an emoji is included in the string, which isn't the behaviour I'm looking for:

    (每当字符串中包含表情符号时, 2A.2 regex.test(str)返回true,这不是我要查找的行为:)

let regex = /(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])/;

console.log(regex.test("5??")); // true - correct
console.log(regex.test("a")); // false - correct
console.log(regex.test("??????")); // true - should be false
console.log(regex.test("hello ?!")); // true - should be false

Part B - With start/end of string regex ( ^ and $ )

(B部分-以字符串正则表达式的开头/结尾( ^$)

  • 2B.1 str.match(regex) returns null on certain emojis for some reason.

    (2B.1 str.match(regex)由于某些原因在某些表情符号上返回null 。)

    I have no clue why but I'm assuming it has some relation as to why str.match(regex) would break down these emojis in Part A:

    (我不知道为什么,但是我假设它与为什么str.match(regex)在A部分中分解这些表情符号有一些关系:)

let regex = /^(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])$/;

console.log("5??".match(regex)); // null
console.log("??".match(regex)); // [ '??', '??', index: 0, input: '??' ]
console.log("???".match(regex)); // null
console.log("?".match(regex)); // [ '?', '?', index: 1, input: 'a?' ]
console.log("????".match(regex)); // null
  • 2B.2 regex.test(str) will return false on the same emojis where it would return null on str.match(regex) :

    (2B.2 regex.test(str)在相同的表情符号上将返回false ,在str.match(regex)上将返回null :)

let regex = /^(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])$/;

console.log(regex.test("5??")); // false - should be true
console.log(regex.test("??")); // true - correct
console.log(regex.test("???")); // false - should be true
console.log(regex.test("?")); // true - correct
console.log(regex.test("????")); // false - correct

Part C - Other regular expressions

(C部分-其他正则表达式)

  • I found this one but it gives similar inconsistencies, although not the same /(?:[?-?]|(?:?[?-?]){2}|[?-?][?-?]|[#-9]???|秘|祝|?|?|?|?[?-?]|?[?-?]|??|?[?-?]|?[?-?]|[?[??]|?[?-?]|??|??|[?[??]|??|??|?[?-?]|[?[??]|?[?-?]|?|?|[?-?]|?|?|[?-?]|?|?|?|?|??|[?-?]|?|?|?|?|?|?|?|?|?|?|?|[?-?]|[?-?]|??|?|?|[←-?])/g :

    (我发现了这个,但它给出了相似的不一致之处,尽管不相同/(?:[?-?]|(?:?[?-?]){2}|[?-?][?-?]|[#-9]???|秘|祝|?|?|?|?[?-?]|?[?-?]|??|?[?-?]|?[?-?]|[?[??]|?[?-?]|??|??|[?[??]|??|??|?[?-?]|[?[??]|?[?-?]|?|?|[?-?]|?|?|[?-?]|?|?|?|?|??|[?-?]|?|?|?|?|?|?|?|?|?|?|?|[?-?]|[?-?]|??|?|?|[←-?])/g :)

let regex = /^(?:[u2700-u27bf]|(?:ud83c[udde6-uddff]){2}|[ud800-udbff][udc00-udfff]|[u0023-u0039]ufe0f?u20e3|u3299|u3297|u303d|u3030|u24c2|ud83c[udd70-udd71]|ud83c[udd7e-udd7f]|ud83cudd8e|ud83c[udd91-udd9a]|ud83c[udde6-uddff]|[ud83c[ude01uddff]|ud83c[ude01-ude02]|ud83cude1a|ud83cude2f|[ud83c[ude32ude02]|ud83cude1a|ud83cude2f|ud83c[ude32-ude3a]|[ud83c[ude50ude3a]|ud83c[ude50-ude51]|u203c|u2049|[u25aa-u25ab]|u25b6|u25c0|[u25fb-u25fe]|u00a9|u00ae|u2122|u2139|ud83cudc04|[u2600-u26FF]|u2b05|u2b06|u2b07|u2b1b|u2b1c|u2b50|u2b55|u231a|u231b|u2328|u23cf|[u23e9-u23f3]|[u23f8-u23fa]|ud83cudccf|u2934|u2935|[u2190-u21ff])$/g

console.log(regex.test("5??")); // true - correct
console.log(regex.test("??")); // false - should be true
console.log(regex.test("???")); // false - should be true
console.log(regex.test("?")); // true - correct
console.log(regex.test("????")); // false - correct
  • Also this breaks horribly (second test changes based on first test?)

    (这也令人震惊(第二测试基于第一测试而改变?))

let regex = /^(?:[u2700-u27bf]|(?:ud83c[udde6-uddff]){2}|[ud800-udbff][udc00-udfff]|[u0023-u0039]ufe0f?u20e3|u3299|u3297|u303d|u3030|u24c2|ud83c[udd70-udd71]|ud83c[udd7e-udd7f]|ud83cudd8e|ud83c[udd91-udd9a]|ud83c[udde6-uddff]|[ud83c[ude01uddff]|ud83c[ude01-ude02]|ud83cude1a|ud83cude2f|[ud83c[ude32ude02]|ud83cude1a|ud83cude2f|ud83c[ude32-ude3a]|[ud83c[ude50ude3a]|ud83c[ude50-ude51]|u203c|u2049|[u25aa-u25ab]|u25b6|u25c0|[u25fb-u25fe]|u00a9|u00ae|u2122|u2139|ud83cudc04|[u2600-u26FF]|u2b05|u2b06|u2b07|u2b1b|u2b1c|u2b50|u2b55|u231a|u231b|u2328|u23cf|[u23e9-u23f3]|[u23f8-u23fa]|ud83cudccf|u2934|u2935|[u2190-u21ff])$/g

console.log(regex.test("????♂?")); // false
console.log(regex.test("?")); // true
let regex = /^(?:[u2700-u27bf]|(?:ud83c[udde6-uddff]){2}|[ud800-udbff][udc00-udfff]|[u0023-u0039]ufe0f?u20e3|u3299|u3297|u303d|u3030|u24c2|ud83c[udd70-udd71]|ud83c[udd7e-udd7f]|ud83cudd8e|ud83c[udd91-udd9a]|ud83c[udde6-uddff]|[ud83c[ude01uddff]|ud83c[ude01-ude02]|ud83cude1a|ud83cude2f|[ud83c[ude32ude02]|ud83cude1a|ud83cude2f|ud83c[ude32-ude3a]|[ud83c[ude50ude3a]|ud83c[ude50-ude51]|u203c|u2049|[u25aa-u25ab]|u25b6|u25c0|[u25fb-u25fe]|u00a9|u00ae|u2122|u2139|ud83cudc04|[u2600-u26FF]|u2b05|u2b06|u2b07|u2b1b|u2b1c|u2b50|u2b55|u231a|u231b|u2328|u23cf|[u23e9-u23f3]|[u23f8-u23fa]|ud83cudccf|u2934|u2935|[u2190-u21ff])$/g;

console.log(regex.test("?")); // true
console.log(regex.test("?")); // false

Is there a way around all this emoji/unicode/regex mess?

(有没有办法解决所有这些表情符号/ unicode / regex混乱?)

Are libraries/apis the only way?

(库/ api是唯一的方法吗?)

How do they do it?

(他们是怎么做到的呢?)

  ask by luxluxdev translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First of all thanks for providing such detail in your question.

(首先,感谢您提供问题的详细信息。)

About the problem itself, I would recommend checking things that are either tested or used by many other people.

(关于问题本身,我建议检查经过许多人测试或使用的事物。)

  1. This emoji-regex library has over 10k downloads on npm and it has open source code provided on github

    (这个emoji-regex库在npm上有超过1万次下载,并且在github上提供了开源代码)

But I know that it is still library, it has it owns issues and sometimes bugs, and maybe you would like to have a full control over this part of code.

(但是我知道它仍然是库,它有问题,有时还有错误,也许您想完全控制这部分代码。)

  1. This rather simple Regex has been tested on over 4000 currently used emojis.

    (这个相当简单的Regex已经在4000多种当前使用的表情符号上进行了测试。)

    It is easy to use and extend.

    (它易于使用和扩展。)

That's what I can provide you with, but still this might not be enough for your exact case, so simply experiment with it and change for your needs.

(这就是我可以为您提供的,但是对于您的确切情况而言,这可能还不够,因此只需尝试一下并根据您的需求进行更改即可。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...