Re: simplifying foreign key/RI checks

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: simplifying foreign key/RI checks
Date: 2022-05-02 11:50:10
Message-ID: CA+HiwqEdvJfAo5NALUreHUKkc5q5N6jbv_kOyn7SqSmZ44owcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 11, 2022 at 4:47 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> This one has been marked Returned with Feedback in the CF app, which
> makes sense given the discussion on -committers [1].
>
> Agree with the feedback given that it would be better to address *all*
> RI trigger check/action functions in the project of sidestepping SPI
> when doing those checks/actions, not only RI_FKey_check_ins / upd() as
> the current patch does. I guess that will require thinking a little
> bit harder about how to modularize the new implementation so that the
> various trigger functions don't end up with their own bespoke
> check/action implementations.
>
> I'll think about that, also consider what Corey proposed in [2], and
> try to reformulate this for v16.

I've been thinking about this and wondering if the SPI overhead is too
big in the other cases (cases where it is the FK table that is to be
scanned) that it makes sense to replace the actual planner (invoked
via SPI) by a hard-coded mini-planner for the task of figuring out the
best way to scan the FK table for a given PK row affected by the main
query. Planner's involvement seems necessary in those cases, because
the choice of how to scan the FK table is not as clear cut as how to
scan the PK table.

ISTM, the SPI overhead consists mainly of performing GetCachedPlan()
and executor setup/shutdown, which can seem substantial when compared
to the core task of scanning the PK/FK table, and does add up over
many rows affected by the main query, as seen by the over 2x speedup
for the PK table case gained by shaving it off with the proposed patch
[1]. In the other cases, the mini-planner will need some cycles of
its own, even though maybe not as many as by the use of SPI, so the
speedup might be less impressive.

Other than coming up with an acceptable implementation for the
mini-planner (maybe we have an example in plan_cluster_use_sort() to
ape), one more challenge is to figure out a way to implement the
CASCADE/SET trigger routines. For those, we might need to introduce
restricted forms of ExecUpdate(), ExecDelete() that can be called
directly, that is, without a full-fledged plan. Not having to worry
about those things does seem like a benefit of just continuing to use
the SPI in those cases.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]
drop table pk, fk;
create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select i%1000000+1 from generate_series(1, 10000000) i;

Time for the last statement:

HEAD: 67566.845 ms (01:07.567)

Patched: 26759.627 ms (00:26.760)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Godfrin, Philippe E 2022-05-02 12:45:28 limiting collected query text length in pg_stat_statements
Previous Message Alvaro Herrera 2022-05-02 11:44:15 Re: bogus: logical replication rows/cols combinations