From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jorge Campins <jrcampins(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: information_schema query not working in PostgreSQL version 17 |
Date: | 2025-05-28 02:11:42 |
Message-ID: | 1375422.1748398302@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Jorge Campins <jrcampins(at)gmail(dot)com> writes:
> You will find attached 2 sql files:
> - select-for-version-16-or-lower.sql
> - select-for-version-17
> Both queries return the same result; I've been using the
> select-for-version-16-or-under query since a few versions ago. Since
> upgrading to version 17, it doesn't work with a database with many defined
> constraints. By many, I mean:
I spent some time poking at this. I didn't try to make a database
with as many foreign keys as you have, but there's already a pretty
noticeable runtime difference for this query in the regression
database, with sixty-some foreign keys. The plan that's being chosen
is different and a lot worse in 17, as a result of underestimates
of the number of rows returned by the joins. It looks like the
proximate cause of that is that v16 always estimates the number of
rows produced by _pg_expandarray() as 1000, while v17 produces a
substantially more accurate estimate (about 1, in my test case)
thanks to commits 58054de2d and 9391f7152. In v16, that enormous
overestimate accidentally compensates for underestimates elsewhere
and keeps it from choosing bad join methods. In v17, not so much.
I don't think there are any near-term fixes available on our side.
The information_schema views are so messy that they are just hard
to plan accurately: there is a substantial impedance mismatch between
our catalogs and the SQL model, and that's tough to deal with.
One thing that's worth calling out here is that the SQL standard
is of the opinion that constraint_schema plus constraint_name is a
unique identifier. But in Postgres it is not: in our implementation
constraint names are only unique within a table, so that you need
schema name plus table name plus constraint name to be sure you are
identifying a single constraint. This makes use of
information_schema.referential_constraints really quite hazardous,
as you can't join it to other views reliably. If you have faithfully
adhered to the standard's model and not used conflicting constraint
names in different tables, it's fine ... but with no enforcement of
that at the system level, do you want to rely on it?
Keeping that in mind, I experimented with
JOIN
information_schema.key_column_usage AS kcu
ON kcu.constraint_name = tc.constraint_name
+ AND kcu.table_name = tc.table_name
AND kcu.table_schema = tc.table_schema
Of course that only fixes one of the three joins, but interestingly
it did improve the plan for me -- I wonder if it does anything for
you?
Anyway, between the performance issues and this fundamental
correctness issue, I think you should forget about using
information_schema for this query and instead write it to
look directly at the Postgres catalogs.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2025-05-28 03:30:48 | BUG #18938: Logical replication failure in 16.9: "invalid memory alloc request size 1372786672" |
Previous Message | Michael Paquier | 2025-05-28 00:15:33 | Re: SIMILAR TO expressions translate wildcards where they shouldn't |