I'm using R 3.1.1 on Windows 7 32bits. I'm having a lot of problems reading some text files on which I want to perform textual analysis. According to Notepad++, the files are encoded with "UCS-2 Little Endian". (grepWin, a tool whose name says it all, says the file is "Unicode".)
The problem is that I can't seem to read the file even specifying that encoding. (The characters are of the standard spanish Latin set -?áó- and should be handled easily with CP1252 or anything like that.)
> Sys.getlocale()
[1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252"
> readLines("filename.txt")
[1] "?tE" "" "" "" "" ...
> readLines("filename.txt",encoding="UTF-8")
[1] "xffxfeE" "" "" "" "" ...
> readLines("filename.txt",encoding="UCS2LE")
[1] "?tE" "" "" "" "" "" "" ...
> readLines("filename.txt",encoding="UCS2")
[1] "?tE" "" "" "" "" ...
Any ideas?
Thanks!!
edit: the "UTF-16", "UTF-16LE" and "UTF-16BE" encondings fails similarly
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…