Re: More efficient RI checks - take 2

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More efficient RI checks - take 2
Date: 2020-04-28 14:31:54
Message-ID: 20200428143154.GX13712@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Thu, Apr 23, 2020 at 10:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I think we're failing to communicate here. I agree that if the goal
> > is simply to re-implement what the RI triggers currently do --- that
> > is, retail one-row-at-a-time checks --- then we could probably dispense
> > with all the parser/planner/executor overhead and directly implement
> > an indexscan using an API at about the level genam.c provides.
> > (The issue of whether it's okay to require an index to be available is
> > annoying, but we could always fall back to the old ways if one is not.)
> >
> > However, what I thought this thread was about was switching to
> > statement-level RI checking. At that point, what we're talking
> > about is performing a join involving a not-known-in-advance number
> > of tuples on each side. If you think you can hard-wire the choice
> > of join technology and have it work well all the time, I'm going to
> > say with complete confidence that you are wrong. The planner spends
> > huge amounts of effort on that and still doesn't always get it right
> > ... but it does better than a hard-wired choice would do.
>
> Oh, yeah. If we're talking about that, then getting by without using
> the planner doesn't seem feasible. Sorry, I guess I didn't read the
> thread carefully enough.

Yeah, I had been thinking about what we might do with the existing
row-level RI checks too. If we're able to get statement-level without
much impact on the single-row-statement case then that's certainly
interesting, although it sure feels like we're ending up with a lot left
on the table.

> As you say, perhaps there's room for both things, but also as you say,
> it's not obvious how to decide intelligently between them.

The single-row case seems pretty clear and also seems common enough that
it'd be worth paying the cost to figure out if it's a single-row
statement or not.

Perhaps we start with row-level for the first row, implemented directly
using an index lookup, and when we hit some threshold (maybe even just
"more than one") switch to using the transient table and queue'ing
the rest to check at the end.

What bothers me the most about this approach (though, to be clear, I
think we should still pursue it) is the risk that we might end up
picking a spectacularly bad plan that ends up taking a great deal more
time than the index-probe based approach we almost always have today.
If we limit that impact to only cases where >1 row is involved, then
that's certainly better (though maybe we'll need a GUC for this
anyway..? If we had the single-row approach + the statement-level one,
presumably the GUC would just make us always take the single-row method,
so it hopefully wouldn't be too grotty to have).

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-04-28 14:41:15 Re: Binary COPY IN size reduction
Previous Message Tom Lane 2020-04-28 14:31:03 Re: [pg_dump] 'create index' statement is failing due to search_path is empty