| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
| Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(at)vondra(dot)me>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, hs(at)cybertec(dot)at, Jeff Davis <pgsql(at)j-davis(dot)com> |
| Subject: | Re: Is there value in having optimizer stats for joins/foreignkeys? |
| Date: | 2026-05-21 20:25:13 |
| Message-ID: | 24247.1779395113@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> writes:
> Here's v7, another attempt to fix the unstable tests.
Hi Alexandra,
I signed up for an in-person review of this at PGConf.dev, but
the schedule doesn't seem to be working in favor of making that
happen. If you see this and happen to run into me in the
hallway, I'm happy to chat, but in any case here are my
rather-hasty review notes.
I think it's okay if v1 only handles 2-way joins, as long as the
catalog representation is prepared for more. Restricting to
cases where we can do index-based sampling seems fine too.
Those things could be relaxed later if it seems worthwhile,
but we'd have a creditable feature even without.
I didn't read the sampling code in any detail. I think you will
need to put more thought into what is user-friendly behavior
in case the required index doesn't exist or doesn't have the
right properties. (I think the tests for that might not be
strong enough, either.)
I think you could simplify some code noticeably if you included the
anchor rel's OID as the first element of stxjoinrels[]. Yeah,
it'd be redundant with stxrelid, but so what? It's not like
pg_statistic_ext rows are narrow enough that anyone would notice
the extra 4 bytes. I think this would simplify some of the
relationships within the data structures, too, eg all varnos in
the expressions could be considered to reference stxjoinrels[].
I don't love stxkeyrefs[]. I wonder if it's time to throw away
stxkeys[], represent all the target columns as regular expression
trees in stxexprs, and then special-case columns that are simple
Vars where appropriate at execution.
(In the same vein, I dislike the grammar's separation of plain
columns from expressions; I'd like to replace stats_params
with expr_list and sort it all out later. But perhaps that's
material for a separate patch.)
We will need to put more thought into permissions: I don't think
requiring all the tables to have the same owner is workable.
(What happens if someone tries to ALTER OWNER later?) However,
if they don't all have the same owner, there are potential security
problems, so the right restriction is not obvious. This is not
necessary to solve now; there are bigger questions to worry about.
But we'll need an answer before it's committable.
It's not too soon to write some user-facing documentation.
CREATE STATISTICS man page obviously needs attention, but
also the discussion of extended stats in perform.sgml.
And catalogs.sgml. I find that writing that sort of stuff
helps to clarify where one's design is weak.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mohamed ALi | 2026-05-21 20:51:58 | Re: [PATCH] Add NESTED_STATEMENTS option to EXPLAIN |
| Previous Message | Isaac Morland | 2026-05-21 18:45:43 | Re: Rename Postgres 19 to Postgres 26 (year-based)? |