| From: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | jian he <jian(dot)universality(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(at)vondra(dot)me>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, hs(at)cybertec(dot)at, Jeff Davis <pgsql(at)j-davis(dot)com> |
| Subject: | Re: Is there value in having optimizer stats for joins/foreignkeys? |
| Date: | 2026-05-13 17:30:56 |
| Message-ID: | CAK98qZ2mMJ3Nvx9U1S9N9Put1bb=iOgy8vFdp=rfjfta4wJ2AQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Jian,
Thanks for reviewing! At the current stage, I'm more interested in
reviews around the overall design and the feasibility of the current
approach, as I mentioned earlier:
On Tue, Apr 28, 2026 at 10:59 PM Alexandra Wang <
alexandra(dot)wang(dot)oss(at)gmail(dot)com> wrote:
> I'd appreciate feedback on:
> - The catalog design (stxjoinrels, stxkeyrefs, stxjoinconds)
> - The index-based sampling implementation
> - The planner integration (clausesel.c, costsize.c)
> - What should be in scope for v1 vs deferred
I will incorporate your review feedback in the next revision, but for
now, I'd like to focus the discussion on the above areas before
polishing the existing patch.
On Wed, May 13, 2026 at 8:16 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> On Wed, May 6, 2026 at 8:19 PM Alexandra Wang
> <alexandra(dot)wang(dot)oss(at)gmail(dot)com> wrote:
> >
> > Here's the rebased patch, it only needed a catalog version bump.
> >
>
> No need to update src/include/catalog/catversion.h during dev cycle.
I needed to bump the catalog version because CI broke without that
change.
> + if (!IsA(lfirst(l), OpExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("join statistics require a single equijoin condition per pair
> of tables")));
> In the error message, should we replace "equijoin" with "equality join"?
> IMHO, "equijoin" is kind of informal wording.
OK. I'll update the error message in the next revision.
> -- Unintended error case1
> DROP TABLE IF EXISTS t1;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY, val TEXT NOT NULL);
> CREATE STATISTICS x (mcv) ON t1.val FROM t1 JOIN t1 as t1s ON (t1.id =
t1s.id);
> alter table t1 alter column id set data type int8;
> ERROR: could not open relation with OID 0
>
> -- Unintended error case2
> DROP TABLE IF EXISTS t1, t2;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY, val TEXT NOT NULL);
> CREATE TABLE t2 (id INTEGER PRIMARY KEY, t1_id INTEGER NOT NULL);
> CREATE STATISTICS xx (mcv) ON t1.val FROM t2 JOIN t1 ON (t2.t1_id = t1.id
);
> alter table t1 alter column id set data type int8;
> ERROR: missing FROM-clause entry for table "t1"
> LINE 1: alter table t1 alter column id set data type int8;
> ^
>
> -- Unintended error case3
> DROP TABLE IF EXISTS t1, t2;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY, val TEXT NOT NULL);
> CREATE TABLE t2 (id INTEGER PRIMARY KEY, t1_id INTEGER generated always
as (1));
> CREATE STATISTICS xx (mcv) ON t1.val FROM t2 JOIN t1 ON (t2.t1_id = t1.id
);
> alter table t2 alter column t1_id set expression as (2);
> ERROR: missing FROM-clause entry for table "t1"
> LINE 1: alter table t2 alter column t1_id set expression as (2);
> ^
>
> -- Unintended error case4
> DROP TABLE IF EXISTS t1, t2;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY, val TEXT NOT NULL);
> CREATE TABLE t2 (id INTEGER PRIMARY KEY, t1_id INTEGER generated always
as (1));
> CREATE STATISTICS xx (mcv) ON t1.val FROM t2 JOIN t1 ON (t2 = t1);
> ERROR: cache lookup failed for attribute 0 of relation 18388
>
> In src/test/regress/sql/stats_ext_crossrel.sql, we need to add test cases
for
> ALTER COLUMN SET DATA TYPE and ALTER COLUMN SET EXPRESSION on columns
> involved in join statistics.
> These ALTER TABLE operations will internally recreate the affected join
stats.
>
> src/test/regress/sql/stats_ext_crossrel.sql currently lacks tests for
equality
> joins where the join qual contains whole-row variable references.This is
> important because whole-row variables raise the question of whether join
> statistics should be recreated when any column in the relation is
modified via
> ALTER COLUMN SET DATA TYPE or ALTER COLUMN SET EXPRESSION.
Good call! I'll make sure to include these edge cases in the next
revision.
Best,
Alex
--
Alexandra Wang
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dmitry Dolgov | 2026-05-13 18:08:02 | Re: Add ssl_(supported|shared)_groups to sslinfo |
| Previous Message | Robert Haas | 2026-05-13 17:16:45 | Re: [PATCH] Add pg_current_vxact_id() function to expose virtual transaction IDs |