Re: Fast COPY FROM based on batch insert

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com
Subject: Re: Fast COPY FROM based on batch insert
Date: 2022-08-22 08:44:27
Message-ID: CAPmGK15r9RjMUnpavRUh7jXag9S+MEvNSSkevnPj+y3QxTzWpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 15, 2022 at 2:29 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 8/9/22 16:44, Etsuro Fujita wrote:
> >>> -1 foo
> >>> 1 bar
> >>> \.
> > ERROR: new row for relation "t1" violates check constraint "t1_f1positive"
> > DETAIL: Failing row contains (-1, foo).
> > CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
> > ($1, $2), ($3, $4)
> > COPY ft1, line 3
> >
> > In single-insert mode the error context information is correct, but in
> > batch-insert mode it isn’t (i.e., the line number isn’t correct).
> >
> > The error occurs on the remote side, so I'm not sure if there is a
> > simple fix. What I came up with is to just suppress error context
> > information other than the relation name, like the attached. What do
> > you think about that?

> I've spent many efforts to this problem too. Your solution have a
> rationale and looks fine.
> I only think, we should add a bit of info into an error report to
> simplify comprehension why don't point specific line here. For example:
> 'COPY %s (buffered)'
> or
> 'COPY FOREIGN TABLE %s'
>
> or, if instead of relname_only field to save a MultiInsertBuffer
> pointer, we might add min/max linenos into the report:
> 'COPY %s, line between %llu and %llu'

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush(). Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table. My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful. Maybe I'm too
worried about that, though.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2022-08-22 08:55:22 Re: Asynchronous execution support for Custom Scan
Previous Message Amit Kapila 2022-08-22 08:41:01 Re: making relfilenodes 56 bits