From: | Nicolas Grilly <nicolas(at)gardentechno(dot)com> |
---|---|
To: | John R Pierce <pierce(at)hogranch(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Why "copy ... from stdio" does not return immediately when reading invalid data? |
Date: | 2011-02-07 14:55:24 |
Message-ID: | AANLkTimca1JWHAf1XGRtNAzMG4fNKqjyXN0qDjmA5Qdr@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
I have analyzed the PostgreSQL protocol using Wireshark (an open source
packet analyzer), and I observed that the PostgreSQL backend, while doing a
COPY ... FROM STDIN, reports errors as soon as possible (especially errors
related to invalid data).
Therefore, the "late" reporting of errors while doing a COPY ... FROM STDIN
is not a limitation of the underlying protocol; it is a limitation (or a
design choice) of the libpq library.
It looks like this is a well known issue because it is listed on the todo
list:
http://wiki.postgresql.org/wiki/Todo#COPY
And was discussed before:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php
Do you think it is possible to change that behavior, or work around it?
While reading libpq source code, I noticed the function pqParseInput3 (file
fe-protocol3.c) ignores error responses while the connection is
in PGASYNC_COPY_IN state. Maybe we can make a special case for the COPY FROM
subprotocol and handle errors early, in order to make them available to
PQgetResult? Is is feasible in a simple way or is it a bad idea?
Regards,
Nicolas Grilly
On Wed, Feb 2, 2011 at 20:06, John R Pierce <pierce(at)hogranch(dot)com> wrote:
> On 02/02/11 10:20 AM, Nicolas Grilly wrote:
>
>> Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send
>> gigabytes of data with just one "copy ... from stdio" query, and is there a
>> way to be notified of a potential error before calling PQputCopyEnd? Or do I
>> have to send my data in small chunks (for example batch of 10000 rows),
>> issue a PQputCopyEnd, check for errors, and continue with the next chunk?
>>
>
> I would batch the data, maybe 1000 lines or even 100 lines at a time if
> these errors are at all frequent. put the errored batches in an exception
> list or something so you can sort them out later.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Sullivan | 2011-02-07 15:09:29 | Re: How to create index on only some of the rows |
Previous Message | Andrew Sullivan | 2011-02-07 14:47:03 | Re: Question about switchover with PG9 replication |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-02-07 14:58:15 | Re: [COMMITTERS] pgsql: remove tags. |
Previous Message | Bruce Momjian | 2011-02-07 14:55:22 | Re: little mistakes in HS/SR |