I am trying to solve this interview question.
After given clearly definition of UTF-8 format. ex: 1-byte :
0b0xxxxxxx 2- bytes:.... Asked to write a function to validate whether
the input is valid UTF-8. Input will be string/byte array, output
should be yes/no.
I have two possible approaches.
First, if the input is a string, since UTF-8 is at most 4 byte, after we remove the first two characters "0b", we can use Integer.parseInt(s) to check if the rest of the string is at the range 0 to 10FFFF. Moreover, it is better to check if the length of the string is a multiple of 8 and if the input string contains all 0s and 1s first. So I will have to go through the string twice and the complexity will be O(n).
Second, if the input is a byte array (we can also use this method if the input is a string), we check if each 1-byte element is in the correct range. If the input is a string, first check the length of the string is a multiple of 8 then check each 8-character substring is in the range.
I know there are couple solutions on how to check a string using Java libraries, but my question is how I should implement the function based on the question.
Thanks a lot.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…