Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Alex K <kondratov(dot)aleksey(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>, pg(at)heroku(dot)com
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-06-07 19:34:13
Message-ID: CADfU8Wy3ZSSdkWpBsW-rkXR3OSg+LsWiGP3PEEN8dapiA7-x-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi pgsql-hackers,

Thank you again for all these replies. I have started working under this
project
and learnt a lot of new stuff last month, so here are some new thoughts
about
ERRORS handling in COPY. I decided to stick to the same thread, since it
has a neutral subject.

(1) One of my mentors--Alvaro Herrera--suggested me to have a look on the
UPSERT. It may be a good point to be able to achieve the same functionality
as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples
and errors handling is turned on. It could additionally reduce number of
failed
subtransactions and reduce XIDs consumption, while still ignoring some
common
errors like unique index violation.

Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems
to be a large separated task and is out of the current project scope, but
maybe there is
a relatively simple way to somehow perform internally tuples insert with
ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as
I understand he is the major contributor of UPSERT in PostgreSQL. It would
be great
if he will answer this question.

(2) Otherwise, I am still going to use subtransactions via
BeginInternalSubTransaction
and PG_TRY / PG_CATCH with
ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction.
To minimize XIDs consumption I will try to insert tuples in batches and
pre-validate
them as much as possible (as was suggested in the thread before).

Alexey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-07 19:50:41 Re: statement_timeout is not working as expected with postgres_fdw
Previous Message David G. Johnston 2017-06-07 19:00:44 Re: PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity