From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-02 16:57:21 |
Message-ID: | f55afd63-8d64-e8df-0d54-13c116b8d23e@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28.06.2019 16:12, Alvaro Herrera wrote:
>> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> Or even just return it as a row. CopyBoth is relatively widely supported
>>> these days.
>> i think generating warning about it also sufficiently meet its propose of
>> notifying user about skipped record with existing logging facility
>> and we use it for similar propose in other place too. The different
>> i see is the number of warning that can be generated
> Warnings seem useless for this purpose. I'm with Andres: returning rows
> would make this a fine feature. If the user wants the rows in a table
> as Andrew suggests, she can use wrap the whole thing in an insert.
I agree with previous commentators that returning rows will make this
feature more versatile. Though, having a possibility to simply skip
conflicting/malformed rows is worth of doing from my perspective.
However, pushing every single skipped row to the client as a separated
WARNING will be too much for a bulk import. So maybe just overall stats
about skipped rows number will be enough?
Also, I would prefer having an option to ignore all errors, e.g. with
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.
Anyway, below are some issues with existing code after a brief review of
the patch:
1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and
- processed++;
+ if (!cstate->error_limit)
+ processed++;
is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:
+ if (specConflict)
+ {
+ ...
+ }
+ else
+ processed++;
I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.
2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?
- if (resultRelInfo->ri_NumIndices > 0)
+ if (resultRelInfo->ri_NumIndices > 0 &&
cstate->error_limit == 0)
recheckIndexes = ExecInsertIndexTuples(myslot,
3) Trailing whitespaces added to error messages and tests for some reason:
+ ereport(WARNING,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("skipping \"%s\" --- missing data
for column \"%s\" ",
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\" ",
-ERROR: missing data for column "e"
+ERROR: missing data for column "e"
CONTEXT: COPY x, line 1: "2000 230 23 23"
-ERROR: missing data for column "e"
+ERROR: missing data for column "e"
CONTEXT: COPY x, line 1: "2001 231 \N \N"
Otherwise, the patch applies/compiles cleanly and regression tests are
passed.
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2019-07-02 18:06:18 | Re: New EXPLAIN option: ALL |
Previous Message | Tom Lane | 2019-07-02 15:47:32 | Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.) |