Re: Parallel copy

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-05-14 20:20:54
Message-ID: CA+TgmobupWCSSz3qbnHXKCd-onDtKN_3n2QPisPB=v9zZQK-8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 14, 2020 at 2:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> To support that, we need to consider a few things.
> a. Currently, we increment the command counter each time we take a key
> share lock on a tuple during trigger execution. I am really not sure
> if this is required during Copy command execution or we can just
> increment it once for the copy. If we need to increment the command
> counter just once for copy command then for the parallel copy we can
> ensure that we do it just once at the end of the parallel copy but if
> not then we might need some special handling.

My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.

> b. Another point is that after inserting rows we record CTIDs of the
> tuples in the event queue and then once all tuples are processed we
> call FK trigger for each CTID. Now, with parallelism, the FK checks
> will be processed once the worker processed one chunk. I don't see
> any problem with it but still, this will be a bit different from what
> we do in serial case. Do you see any problem with this?

I think there could be some problems here. For instance, suppose that
there are two entries for different workers for the same CTID. If the
leader were trying to do all the work, they'd be handled
consecutively. If they were from completely unrelated processes,
locking would serialize them. But group locking won't, so there you
have an issue, I think. Also, it's not ideal from a work-distribution
perspective: one worker could finish early and be unable to help the
others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-05-14 20:27:14 Re: Our naming of wait events is a disaster.
Previous Message Robert Haas 2020-05-14 20:03:37 Re: Our naming of wait events is a disaster.