If you apply utf8_encode()
to an already UTF-8 string, it will return garbled UTF-8 output.
I made a function that addresses all this issues. It′s called Encoding::toUTF8()
.
You don't need to know what the encoding of your strings is. It can be Latin1 (ISO 8859-1), Windows-1252 or UTF-8, or the string can have a mix of them. Encoding::toUTF8()
will convert everything to UTF-8.
I did it because a service was giving me a feed of data all messed up, mixing UTF-8 and Latin1 in the same string.
Usage:
require_once('Encoding.php');
use ForceUTF8Encoding; // It's namespaced now.
$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);
$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);
Download:
https://github.com/neitanod/forceutf8
I've included another function, Encoding::fixUFT8()
, which will fix every UTF-8 string that looks garbled.
Usage:
require_once('Encoding.php');
use ForceUTF8Encoding; // It's namespaced now.
$utf8_string = Encoding::fixUTF8($garbled_utf8_string);
Examples:
echo Encoding::fixUTF8("F??d??ration Camerounaise de Football");
echo Encoding::fixUTF8("F???d???ration Camerounaise de Football");
echo Encoding::fixUTF8("F?????d?????ration Camerounaise de Football");
echo Encoding::fixUTF8("F???dération Camerounaise de Football");
will output:
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
I've transformed the function (forceUTF8
) into a family of static functions on a class called Encoding
. The new function is Encoding::toUTF8()
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…