Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: lalbin(at)fhcrc(dot)org
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
Date: 2013-05-23 11:56:35
Message-ID: 519E03F3.70306@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 21.05.2013 18:20, lalbin(at)fhcrc(dot)org wrote:
> The following bug has been logged on the website:
>
> Bug reference: 8173
> Logged by: Lloyd Albin
> Email address: lalbin(at)fhcrc(dot)org
> PostgreSQL version: 9.2.4
> Operating system: SUSE Linux (64-bit)
> Description:
>
> During testing for our 9.2 upgrade, we found that the error messages we were
> expecting did not match what was given by the program. In looking over the
> revision notes from our current version of 9.0.12 through the 9.2.4, that we
> are testing, I believe that I have tracked down the issue to "Improve COPY
> performance by adding tuples to the heap in batches". When I looked at the
> patch code in
> http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found
> that you are inserting 1000 rows at a time. The problem is that on failure,
> you return either row 1000 or the last line, whichever comes first. This can
> be confusing as you will see in the demo code below.
>
> CREATE TABLE public.table1 (
> key INTEGER,
> PRIMARY KEY(key)
> );
>
> Create a csv file with only one column of data, numbered from 1 to 1008.
>
> Make two copies of the file and name them csv_test.csv and csv_test2.csv.
>
> Edit csv_test.csv and change the entry 1000 to 500.
>
> Edit csv_test2.csv and change the entry 900 to 500.
>
> On 9.0.12 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR: duplicate key value violates unique constraint "table1_pkey"
> DETAIL: Key (key)=(500) already exists.
> CONTEXT: COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR: duplicate key value violates unique constraint "table1_pkey"
> DETAIL: Key (key)=(500) already exists.
> CONTEXT: COPY table1, line 900: "500"
>
> Both times the context gave us the correct information.
>
> Now try the same thing on 9.2.4 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR: duplicate key value violates unique constraint "table1_pkey"
> DETAIL: Key (key)=(500) already exists.
> CONTEXT: COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR: duplicate key value violates unique constraint "table1_pkey"
> DETAIL: Key (key)=(500) already exists.
> CONTEXT: COPY table1, line 1000: "1000"
>
> As you can see, the second test returned the last line of the set of tuples
> being recorded not the line that actually failed.
>
> Make a copy of csv_test2.csv and name it csv_test3.csv.
> Edit csv_test3.csv and remove all entries after 994.
>
> COPY public.table1 FROM 'csv_test3.csv';
>
> ERROR: duplicate key value violates unique constraint "table1_pkey"
> DETAIL: Key (key)=(500) already exists.
> CONTEXT: COPY table1, line 995: ""
>
> If you are writing less than 1000 lines then it will return the line after
> the last line with a value of "".

Hmm, yeah, it's quite self-evident what's happening; the server reports
the last line that was *read*, no the line where the error happened.

Committed a fix for this. Unfortunately we only keep the last line read
buffered in text format, so after this you'll only get the line number,
not the content of that line:

postgres=# copy foo from '/tmp/foo';
ERROR: duplicate key value violates unique constraint "foo_pkey"
DETAIL: Key (id)=(4500) already exists.
CONTEXT: COPY foo, line 4500

Thanks for the report!

- Heikki

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2013-05-24 02:49:39 Re: Odd Behavior After Multiple Deletes
Previous Message John R Pierce 2013-05-22 16:38:19 Re: pass to install