Re: Is it possible to set end-of-data marker for COPY statement.

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Junfeng Yang <yjerome(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is it possible to set end-of-data marker for COPY statement.
Date: 2020-09-01 16:30:30
Message-ID: CAKFQuwbHq5bAbgt4h2mCyg=+nPzscc=KQiSL7uossOqUCiq6PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Sep 1, 2020 at 9:05 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Sep 1, 2020 at 06:14:45AM +0000, Junfeng Yang wrote:
> > Hi hackers,
> >
>
> > Data in file "/tmp/data".
> >
> > 122,as\.d,adad
> > 133,sa dad,adadad
> >
> > Then execute
> >
> > copy test from '/tmp/data' DELIMITER ',';
> >
> > An end-of-copy marker corrupt error will be raised.
>
> This is the first I am hearing of this. The problem is that the system
> can't decide if \. is escaping a delimiter, or the end-of-copy marker.
> I think we need to just disable period as a delimiter. I don't think
> there is enough demand to allow the end-of-data marker to be
> configurable.
>
> Interestingly, you can use period as s delimiter if you are copying from
> a file that doesn't need an end-of-data marker and you never need to
> escape the delimiter, but that seems like too rare a use case to allow
> period to be supported as a delimiter.
>

Something isn't right here because the rules for end-of-copy are explicit
that the \. must appear on a line all by itself. That isn't the case with
the shown test data.

The system should do one of two things with that input (it seems option 2
is the one we've chosen):

One, see that the character following the backslash is not an action
character and just treat the backslash as data.
Two, complain that the character following the backslash is not a valid
action character.

The system is reporting an error, it's just trying to be helpful and seeing
the period it incorrectly reports that the error has something to do with
the end-of-copy marker when in reality all that can be said is that "a
period in this location is not valid" (unless the command uses DELIMITER
'<period>' at least, in which case the period is now valid and \. means a
literal period since its not alone on a line.)

The only limitation our definition of end-of-copy imposes is that a single
column input file cannot contain a record that is only a period. It does
not impose a limitation on which delimiters are valid.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2020-09-01 16:35:51 Re: How bad is using queries with thousands of values for operators IN or ANY?
Previous Message Bruce Momjian 2020-09-01 16:05:02 Re: Is it possible to set end-of-data marker for COPY statement.

Browse pgsql-hackers by date

  From Date Subject
Next Message Kasahara Tatsuhito 2020-09-01 17:10:22 Re: autovac issue with large number of tables
Previous Message Stephen Frost 2020-09-01 16:29:22 Re: Kerberos support broken on MSVC builds for Windows x64?