| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com> |
| Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: Parallel INSERT SELECT take 2 |
| Date: | 2026-05-17 11:24:29 |
| Message-ID: | 349c44dd-b27f-4fd8-a853-516549ee79d7@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 5/11/26 13:44, Tomas Vondra wrote:
> ...
>
>>>
>>>>> alternative idea
>>>>> ----------------
>>>>>
>>>>> What if we took a very different approach, and just made sure the INSERT
>>>>> part never runs concurrently with the SELECT? Say we fully materialize the
>>>>> SELECT result (e.g. write it into a tuplestore), finish the parallel mode, and
>>>>> only then do the INSERT?
>>>>> ...
I've been wondering about aspects of this approach, particularly about
item (b) in my message, i.e. why we start/end the parallel mode
ExecutePlan and not in the Gather node. Even if acquiring the XID before
entering the parallel mode works, it's a bit annoying.
So I asked about it on our Discord, and there was a little bit
discussion, with the explanation that it's difficult to predict how
exactly will the nodes interleave, and with doing this in Gather the
nodes above might observe various values depending on the plan. At least
that's how I understand/interpret the discussion. Of course, with the
materialization this would not be an issue (I think).
But Tom also mentioned one additional issue for parallelizing DML this
way, and that's EvalPlanQual (EPQ), i.e. the mechanism we use to recheck
concurrently updated rows in READ COMMITTED isolation mode.
I don't see EPQ mentioned in these threads about parallelizing INSERT +
SELECT at all. I suppose that's because EPQ is an issue only for
UPDATE/DELETE queries, not for INSERT. Maybe everyone was aware of it,
which is why the effort was limited to INSERT, not sure?
The question is - is this acceptable? Did people expect to eventually
extend the INSERT solution to UPDATE/DELETE in the future? Because the
materialization does not really allow that, I think. Is it worth it, or
would it be something we regret later? Not sure.
In a way, users can already do exactly what the materialization does by
CREATE UNLOGGED TABLE tmp AS SELECT ...;
INSERT INTO t SELECT * FROM tmp;
and they could even do something like this for UPDATE/DELETE, except
that it (of course) won't do the EPQ recheck against relations in the
original query.
I'm not sure if we need to do something about EPQ. If it's acceptable to
have parallel SELECT only for INSERT, then we don't need to do anything.
In that case it's maybe a bit wrong to use "PARALLEL DML" in the syntax.
If we need to eventually support all DML types, then I guess we'd need
to come up with some way to do EPQ. A couple random ideas about that:
a) It's probably not feasible to do EPQ rechecks with parallel plans, at
least not exactly the way we do them now. The EPQ injects tuples into
various scan nodes, and it's unclear how to do that with parallel plans.
The leader would need to inject tuples into workers, it's not clear
which worker(s) should get the tuple, maybe the workers are not still
running at that point, etc. The EPQ state would have to be shared with
the workers somehow, etc. That seems like a substantial amount of work.
b) Perhaps it would be possible to identify plans that are somehow
"safe" to parallelize / materialize as proposed for INSERTs? Are there
plans that won't need to inject tuples into the "subquery" part of the
ModifyTable? The executor/README says In UPDATE/DELETE/MERGE, only the
target relation needs to be handled this way. so what if the target
relation is not in the subplan at all (or in some part of it)? Of
course, I'm not sure common such queries are, but in my experience it's
not uncommon to have a large ETL job that aggregates a large table then
inserts/merges the result into a different one. Also, queries with
UPDATE ... FROM ... do a join with the target table, and EPQ may need to
rerun the join, so the materialization would need to happen only on one
side of the join (I think).
c) Another option I can think of is doing the EPQ in the leader, a bit
as if the parallel plan runs with no workers. So maybe we could "force"
it to do this for EPQ rechecks? I suppose it'd need to have a completely
separate plan for this purpose, not sure of that's possible.
d) Maybe the answer to parallel DML is "don't do EPQ"? There's an
article explaining how CockroachDB does EPQ, or rather how it solved the
problem without EPQ by retrying the DML. Would be a significant change
in behavior, of course, and I'm sure it has other disadvantages.
e) AFAIK this is a problem only in READ COMMITTED. In SERIALIZABLE the
DML simply fails if there are concurrent modifications, and it's up to
the application to retry the whole thing (which for the ETL jobs would
almost never happen). In a way this is a version of (d) but with the
retry performed by the application. But it means parallelizing DML would
be simpler in SERIALIZABLE, i.e. a stricter serialization mode, which is
a bit surprising (but also not entirely).
Anyway, all of this is mostly just a brain dump. I don't have a clear
plan, or even a full understanding of how EPQ works, and the trade offs.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sami Imseih | 2026-05-17 14:34:01 | pgstat: Flush some statistics within running transactions, take 2 |
| Previous Message | Alexander Nestorov | 2026-05-17 10:28:03 | Re: [PATCH] btree_gist: add cross-type integer operator support for GiST |