Re: Patch: FORCE_NULL option for copy COPY in CSV mode

From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Date: 2013-10-09 17:25:56
Message-ID: 20131009172556.GB13993@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 09, 2013 at 09:52:29AM -0400, Andrew Dunstan wrote:
>
> On 10/09/2013 09:22 AM, Amit Kapila wrote:
> >On Wed, Oct 9, 2013 at 9:15 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>On Tue, Oct 8, 2013 at 6:03 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >>>On 10/07/2013 11:34 PM, Amit Kapila wrote:
> >>>>On Tue, Oct 8, 2013 at 12:55 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
> >>>>wrote:
> >>>>>On 10/07/2013 03:06 PM, Robert Haas wrote:
> >>>>>>
> >>>>>>>Also if your use case is to treat empty strings as NULL (as per above
> >>>>>>>documentation), can't it be handled with "WITH NULL AS" option.
> >>>>>>>For example, something like:
> >>>>>>>
> >>>>>>>postgres=# COPY testnull FROM stdin with CSV NULL AS E'';
> >>>>>>>Enter data to be copied followed by a newline.
> >>>>>>>End with a backslash and a period on a line by itself.
> >>>>>>>>>50,
> >>>>>>>>>\.
> >>>>>>>postgres=# select * from testnull;
> >>>>>>> a | b
> >>>>>>>----+------
> >>>>>>> 50 | NULL
> >>>>>>>(1 row)
> >>>>>>Good point. If this patch is just implementing something that can
> >>>>>>already be done with another syntax, we don't need it.
> >>>>>>
> >>>>>Isn't the point of this option to allow a *quoted* empty string to be
> >>>>>forced
> >>>>>to NULL? If so, this is not testing the same case - in fact the COPY
> >>>>>command
> >>>>>above just makes explicit the default CSV NULL setting anyway.
> >>>>I am really not sure if all the purpose of patch can be achieved by
> >>>>existing syntax, neither it is explained clearly.
> >>>>However the proposal hasn't discussed why it's not good idea to extend
> >>>>some similar syntax "COPY .. NULL" which is used to replace string
> >>>>with NULL's?
> >>>>Description of NULL says: "Specifies the string that represents a null
> >>>>value."
> >>>>Now why can't this syntax be extended to support quoted empty string
> >>>>if it's not supported currently?
> >>>>I have not checked completely, If it's difficult or not possible to
> >>>>support in existing syntax, then even it add's more value to introduce
> >>>>new syntax.
> >>>>
> >>>>By asking above question, I doesn't mean that we should not go for the
> >>>>new proposed syntax, rather it's to know and understand the benefit of
> >>>>new syntax, also it helps during CF review for reviewer's if the
> >>>>proposal involves new syntax and that's discussed previously.
> >>>>
> >>>Quite apart from any other consideration, this suggestion is inferior to
> >>>what's proposed in that it's an all or nothing deal, while the patch allows
> >>>you to specify the behaviour very explicitly on a per column basis. I can
> >>>well imagine wanting to be able to force a quoted empty string to null for
> >>>numeric fields but not for text.
> > Okay, but can't it be done by extending current syntax such as
> > NULL FOR <col1, col2> AS ""- which would mean it will replace
> >corresponding column's values as NULL if they contain empty string.
> >
> >>>The basic principal of our CSV processing is that we don't ever turn a NULL
> >>>into something quoted and we don't ever turn something quoted into NULL.
> >>>That's what lets us round-trip test just about every combination of options.
> >>>I'm only going to be happy violating that, as this patch does, in a very
> >>>explicit and controlled way.
> >>Will this option allow only quoted empty string to be NULL or will
> >>handle without quoted empty string as well?
> > I had checked patch and it seems for both quoted and unquoted empty
> >string, it will replace it with NULL.
> > Now here it's bit different from FORCE_NOT_NULL, because already
> >for un-quoted empty strings we have a way to replace them with NULL.
> >
> > I think having 2 different syntax for replacing empty strings (one
> >for quoted and another for un-quoted) as NULL's might not be best way
> >to accomplish
> > this feature.
> >
> >
>
>
> I really don't know what you're saying here.
>
> Here is the situation we have today (assuming the default null
> marker of empty-string):
>
> default: empty-string -> null, quoted-empty-string ->
> emptystring
> with force_not_null: empty-string -> emptystring,
> quoted-empty-string -> emptystring
>
> and the proposal would add to that:
>
> with force-null: empty-string -> null, quoted-empty-string -> null
>
> So it appears to be quite on all fours with the way force_not_null
> works now, it just does the reverse.
>
> I don't see at all that your suggested alternative has any
> advantages over what's been written. If you can say "NULL FOR (foo)
> as '""' how will you specify the null for some other column(s)?

Idea:

NULL FOR (foo,bar,baz,blurf) AS '""', NULL FOR (quux,fleeg) AS ...,

> Are we going to have multiple such clauses? It looks like a real
> mess.

Is that not part of what parsers ordinarily do?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-10-09 17:34:21 Re: Auto-tuning work_mem and maintenance_work_mem
Previous Message Josh Berkus 2013-10-09 17:09:15 Re: Auto-tuning work_mem and maintenance_work_mem