Re: [PATCHES] allow CSV quote in NULL

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] allow CSV quote in NULL
Date: 2007-07-27 18:09:52
Message-ID: 20070727180951.GK4887@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

* Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
> This looks too clever by half, to me. Someone facing the problem you are
> facing would have to dig quite deep to find the solution you're promoting.

Oddly enough, it was one of the first things I tried when I discovered
it wasn't just realizing that ,"", for an integer column meant NULL (and
instead was complaining loudly that you can't convert an empty string
into an integer). It's also pretty clear, to me at least, to say
"put the exact string that shows up between the delimiters here
that you want treated as a NULL" rather than "well, if it's a column
which is quoted then you have to jump through these hoops and tell PG
about each one, but if it's not quoted you have to do this", etc, etc.

> A much better way IMNSHO would be to add an extra FORCE switch. On input,
> FORCE NOT NULL says to treat an unquoted null as the literal value rather
> than as a null field for the columns named. The reverse would be to tell it
> to treat a quoted null as null rather than as the literal value, for the
> named columns. Perhaps that should just be "FORCE NULL columnlist". It
> would be more explicit and at the same time would only apply to the named
> columns, rather than discarding totally the ability to distinguish between
> null and not null values.

I don't see that it needs to be 'more explicit', that's just silly.
Either the user indicated they want it, or they didn't. What you're
suggesting adds in a bunch of, imv, unnecessary complication and ends up
making the resulting code that much bigger and uglier for not much gain.

I'm honestly not a big fan of the "columnlist" approach that's been
taken with the options. While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.

We could instead flip it around and force the users to specify, for
each column, what, exactly, should be done for that column by having
them specify a regexp for that column. The regexp would implicitly have
the delimiter on each side of it and we'd just step through the string
matching as far as we can for each column. Then it's nice and explicit
for everyone but probably not much fun to use.

> This should probably be discussed on -hackers, anyway.

As a small, unobtrusive patch, I felt it didn't need a long discussion
about what everyone's CSV files look like and how "that just shouldn't
be done" or "that's just not sane."

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stuart 2007-07-27 18:22:04 ascii() for utf8
Previous Message Gregory Stark 2007-07-27 17:17:21 Re: [HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2007-07-27 18:26:26 Re: [PATCHES] allow CSV quote in NULL
Previous Message Gregory Stark 2007-07-27 17:17:21 Re: [HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch