Costing elided SubqueryScans more nearly correctly

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Costing elided SubqueryScans more nearly correctly
Date: 2022-05-04 22:32:00
Message-ID: 2581077.1651703520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In [1] I complained about how SubqueryScans that get deleted from
a plan tree by setrefs.c nonetheless contribute cost increments
that might cause the planner to make odd choices. That turned
out not to be the proximate cause of that particular issue, but
it still seems like it might be a good idea to do something about
it. Here's a little patch to improve matters.

It turns out to be hard to predict perfectly whether setrefs.c will
remove a SubqueryScan, because createplan.c plays some games with
moving tlist evaluations around and/or inserting "physical"
(i.e. trivial) tlists, which can falsify any earlier estimate of
whether a SubqueryScan is trivial. I'm not especially interested in
redesigning those mechanisms, so the best we can hope for is an
approximate determination. (Those same behaviors also make a lot of
other path cost estimates a bit incorrect, so it doesn't seem like
we need to feel too awful about not getting SubqueryScan perfect.)

Given that ground rule, however, it's not very hard to determine
whether a SubqueryScanPath looks like it will be trivial and change
its costing accordingly. The attached draft patch does that.

I instrumented the code in setrefs.c, and found that during the
core regression tests this patch estimates correctly in 2103
places while guessing wrongly in 54, so that seems like a pretty
good step forward.

Perhaps I overcomplicated matters by making the callers responsible
for determining triviality of the paths' targets. We could just
make cost_subqueryscan check that for itself (using logic similar
to what I wrote in set_subquery_pathlist), but that'd result in
duplicative calculations anytime we make more than one Path for a
subquery. On the other hand, said calculations wouldn't be that
expensive, so perhaps a more localized patch would be better.

I also notice that setrefs.c can elide Append and MergeAppend nodes
too in some cases, but AFAICS costing of those node types doesn't
take that into account.

Anyway, I'll stick this in the queue for v16.

regards, tom lane

[1] https://www.postgresql.org/message-id/328872.1651247595%40sss.pgh.pa.us

Attachment Content-Type Size
cost-elided-subqueries-better-1.patch text/x-diff 14.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-05-04 23:02:57 Re: Costing elided SubqueryScans more nearly correctly
Previous Message David G. Johnston 2022-05-04 21:52:38 Re: SQL/JSON: FOR ORDINALITY bug