r - 'Embedded nul in string' when importing large CSV (8 GB) with fread()

Question

Welcome To Ask or Share your Answers For Others

r - 'Embedded nul in string' when importing large CSV (8 GB) with fread()

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - 'Embedded nul in string' when importing large CSV (8 GB) with fread()

I have a large CSV file (8.1 GB) that I'm trying to wrangle into R. I created the CSV using Python's csvkit in2csv, converted from a .txt file, but somehow the conversion led to null characters showing up in the file. I'm now getting this error when importing:

Error in fread("file.csv", nrows = 100) : embedded nul in string: '?trecd_zipc'

I am able to import small chunks just fine with read.csv though, but that's because it allows for UTF-16 encoding via the fileEncoding argument.

test <- read.csv("file.csv", nrows=100, fileEncoding="UTF-16LE")

I don't dare try to import an 8 GB file with read.csv, though.

So I then tried the solution offered here, in which you use sed s/\0//g file.csv > file2.csv to pull the nulls out. The command performed just fine and populated a new 8GB CSV file, but I received a nearly-identical error:

Error in fread("file2.csv", nrows = 100) : embedded nul in string: '?trecd_zipc,post_zi

So, that didn't work. I'm stumped at this point. Considering the size of the file, I can't use read.csv on the whole thing, and I'm not sure how to get rid of the nulls in the original CSV. I'm not even sure how the file got encoded as UTF-16. Any suggestions or advice would be greatly appreciated at this point.

Edit: I'm on a Windows machine.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:07:12+0000

If you're on linux/mac, try this

file <- "file.csv"
tt <- tempfile()  # or tempfile(tmpdir="/dev/shm")
system(paste0("tr < ", file, " -d '\000' >", tt))
fread(tt)

Categories

r - 'Embedded nul in string' when importing large CSV (8 GB) with fread()

r - 'Embedded nul in string' when importing large CSV (8 GB) with fread()

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags