I'm trying to read the following UTF-8 encoded file in R, but whenever I read it, the unicode characters are not encoded correctly:
The script I'm using to process the file is as follows:
defaultEncoding <- "UTF8"
detalheVotacaoMunicipioZonaTypes <- c("character", "character", "factor", "factor", "factor", "factor", "factor",
"factor", "factor", "factor", "factor", "factor", "numeric",
"numeric", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "numeric", "numeric",
"numeric", "character", "character")
readDetalheVotacaoMunicipioZona <- function( fileName ) {
fileConnection = file(fileName,encoding=defaultEncoding)
contents <- readChar(fileConnection, file.info(fileName)$size)
close(fileConnection)
contents <- gsub('"', "", contents)
columnNames <- c("data_geracao", "hora_geracao", "ano_eleicao", "num_turno", "descricao_eleicao", "sigla_uf", "sigla_ue",
"codigo_municipio", "nome_municipio", "numero_zona", "codigo_cargo", "descricao_cargo", "qtd_aptos",
"qtd_secoes", "qtd_secoes_agregadas", "qtd_aptos_tot", "qtd_secoes_tot", "qtd_comparecimento",
"qtd_abstencoes", "qtd_votos_nominais", "qtd_votos_brancos", "qtd_votos_nulos", "qtd_votos_legenda",
"qtd_votos_anulados", "data_ult_totalizacao", "hora_ult_totalizacao")
read.csv(text=contents,
colClasses=detalheVotacaoMunicipioZonaTypes,
sep=";",
col.names=columnNames,
fileEncoding=defaultEncoding,
header=FALSE)
}
I read the file sending in the UTF-8 encoding, remove all quotes (even numbers are quoted, so I need to clean them up) and then feed the contents to read.csv
. It reads and processes the file correctly but it seems like it's not using the encoding information I'm giving it.
What should I do to make it use UTF-8 to read this file?
I'm using RStudio on OSX if it makes any difference.
See Question&Answers more detail:
os