Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-05-15 04:19:19
Message-ID: CAA4eK1+aorRNY1DkXRszHsPVjXTjkxe5CZZTetSBAhDEwYr4CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 15, 2020 at 1:51 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, May 14, 2020 at 2:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > To support that, we need to consider a few things.
> > a. Currently, we increment the command counter each time we take a key
> > share lock on a tuple during trigger execution. I am really not sure
> > if this is required during Copy command execution or we can just
> > increment it once for the copy. If we need to increment the command
> > counter just once for copy command then for the parallel copy we can
> > ensure that we do it just once at the end of the parallel copy but if
> > not then we might need some special handling.
>
> My sense is that it would be a lot more sensible to do it at the
> *beginning* of the parallel operation. Once we do it once, we
> shouldn't ever do it again; that's how it works now. Deferring it
> until later seems much more likely to break things.
>

AFAIU, we always increment the command counter after executing the
command. Why do we want to do it differently here?

> > b. Another point is that after inserting rows we record CTIDs of the
> > tuples in the event queue and then once all tuples are processed we
> > call FK trigger for each CTID. Now, with parallelism, the FK checks
> > will be processed once the worker processed one chunk. I don't see
> > any problem with it but still, this will be a bit different from what
> > we do in serial case. Do you see any problem with this?
>
> I think there could be some problems here. For instance, suppose that
> there are two entries for different workers for the same CTID.
>

First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table. So, in
such a case, we can't have the same CTIDs queued for different
workers. Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table. Now, it is possible that two different workers try to
lock the same row of PK table. I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Muhammad Usama 2020-05-15 04:25:50 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Masahiko Sawada 2020-05-15 03:45:34 Fix a typo in slot.c