Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

python - How to read a CSV line with "?

A trivial CSV line could be spitted using string split function. But some lines could have ", e.g.

"good,morning", 100, 300, "1998,5,3"

thus directly using string split would not solve the problem.

My solution is to first split out the line using , and then combining the strings with " at then begin or end of the string.

What's the best practice for this problem?

I am interested if there's a Python or F# code snippet for this.

EDIT: I am more interested in the implementation detail, rather than using a library.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There's a csv module in Python, which handles this.

Edit: This task falls into "build a lexer" category. The standard way to do such tasks is to build a state machine (or use a lexer library/framework that will do it for you.)

The state machine for this task would probably only need two states:

  • Initial one, where it reads every character except comma and newline as part of field (exception: leading and trailing spaces) , comma as the field separator, newline as record separator. When it encounters an opening quote it goes into
  • read-quoted-field state, where every character (including comma & newline) excluding quote is treated as part of field, a quote not followed by a quote means end of read-quoted-field (back to initial state), a quote followed by a quote is treated as a single quote (escaped quote).

By the way, your concatenating solution will break on "Field1","Field2" or "Field1"",""Field2".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...