Latest |Kites |Pictures |Programming |Life
[filed under Programming]Python csv NULL BYTE fail re regex split

My problem is that I have a csv file that Python can't read because it has NULL bytes in it. Python has a built in csv reader, but this chokes and dies :-(  (booo)

Problem 2: line.split(',') also chokes because some lines look like this:

1,2,"3,4,5",6

which have to split into 4 items. Ack. I thought I was going to have to write a little finite state machine (which I like doing) but my csv file is 18meg. So fsm may = slowness.

After some searches I came across this funky bit of regex that I made into even funkier Python:

>>> subject = """1,2,3,"4444","55,55,,,,55",99"""
>>> splitter = re.compile(r',(?=(?:[^"]*"[^"]*")*(?![^"]*"))')
>>> splitter.split(subject)
['1', '2', '3', '"4444"', '"55,55,,,,55"', '99']

:-)

wtf? I'm not sure what is going on, I got lost at "Assert that it is impossible to match the regex below starting at this point (negative lookahead)" hmmm. Yes. Magic.

2nd of July, 2008@3:57:27 PM
add a comment, permanent link to article

Comments

name
website
Check this if you are a human being. Thanks.

Comment

Unique hits [] : Total hits [] : Server Grind [0.0131 seconds]