Re: Parallel copy

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-05-14 06:52:10
Message-ID: CAFiTN-sSN6ZM+2LKo5imaxhosPu461u9v9ZcTTq1AiLqRvrWTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 14, 2020 at 11:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 14, 2020 at 12:39 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Tue, May 12, 2020 at 1:01 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > I don't understand why we need to do something special for combo CIDs
> > > if they are not generated during this operation?
> >
> > Hmm. Well I guess if they're not being generated then we don't need to
> > do anything about them, but I still think we should try to work around
> > having to disable parallelism for a table which is referenced by
> > foreign keys.
> >
>
> Okay, just to be clear, we want to allow parallelism for a table that
> has foreign keys. Basically, a parallel copy should work while
> loading data into tables having FK references.
>
> To support that, we need to consider a few things.
> a. Currently, we increment the command counter each time we take a key
> share lock on a tuple during trigger execution. I am really not sure
> if this is required during Copy command execution or we can just
> increment it once for the copy. If we need to increment the command
> counter just once for copy command then for the parallel copy we can
> ensure that we do it just once at the end of the parallel copy but if
> not then we might need some special handling.
>
> b. Another point is that after inserting rows we record CTIDs of the
> tuples in the event queue and then once all tuples are processed we
> call FK trigger for each CTID. Now, with parallelism, the FK checks
> will be processed once the worker processed one chunk. I don't see
> any problem with it but still, this will be a bit different from what
> we do in serial case. Do you see any problem with this?

IMHO, it should not be a problem because without parallelism also we
trigger the foreign key check when we detect EOF and end of data from
STDIN. And, with parallel workers also the worker will assume that it
has complete all the work and it can go for the foreign key check is
only after the leader receives EOF and end of data from STDIN.

The only difference is that each worker is not waiting for all the
data (from all workers) to get inserted before checking the
constraint. Moreover, we are not supporting external triggers with
the parallel copy, otherwise, we might have to worry that those
triggers could do something on the primary table before we check the
constraint. I am not sure if there are any other factors that I am
missing.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-05-14 08:07:47 Incorrect OpenSSL type reference in code comment
Previous Message Andrey M. Borodin 2020-05-14 06:44:01 Re: MultiXact\SLRU buffers configuration