Fixing the btree_gist inet mess

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Fixing the btree_gist inet mess
Date: 2025-08-01 18:17:43
Message-ID: 2483812.1754072263@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As we've known for years[1][2][3], contrib/btree_gist's opclasses
for inet/cidr columns are fundamentally broken: they rely on the
planner's convert_network_to_scalar() function, which was only
ever intended to give approximate results, so you get the wrong
answers in edge cases. There isn't anything that can be done
about that without breaking on-disk compatibility for such indexes,
so we haven't tried. What we did do some time ago was to implement
a hopefully-correct, in-core gist network_ops opclass to replace the
btree_gist opclasses. But people are still using the btree_gist
opclasses, because those are marked default and the in-core opclass
isn't.

It's past time to move this problem along and try to get out of the
business of encouraging use of known-broken code. I propose that
for v19, we should flip the opcdefault status so that network_ops is
marked default and the btree_gist opclasses are not. This will be
enough to ensure that network_ops is used unless the user explicitly
specifies to do differently. I don't think we should go further than
that yet (ie, not actively disable the btree_gist code) for a couple
of reasons: (1) this step is messy enough already, and (2) given the
current situation, the in-core network_ops opclass may be less well
tested than one would like. So I don't think we have enough evidence
to decide that we can summarily force everyone onto it; broken or not,
there haven't been that many complaints about btree_gist's opclasses.

Having done this, the effects of a plain pg_dump from v18- and restore
into v19+ will be to recreate GiST indexes on inet/cidr columns using
network_ops even if they were previously using btree_gist. That will
happen because in v18-, those opclasses were marked opcdefault and
pg_dump intentionally omits the explicit opclass specification in that
case. So that works the way we want.

pg_upgrade is more of a problem, because its invocation of pg_dump
will also omit the explicit opclass specification, resulting in the
new server thinking that the index uses network_ops while the on-disk
data isn't compatible with that. We can't really change that pg_dump
behavior, because that aspect is managed inside the old server's
pg_get_indexdef() function. The only solution I can see is for
pg_upgrade to refuse to upgrade indexes that use those opclasses.
We can tell users to replace them with network_ops indexes before
upgrading --- that's possible in 9.4 and later, so it should be
a good enough answer for almost everybody.

The attached draft patch implements these ideas and seems to do
the right things in testing. It's worth remarking on the way
that I did the "mark the btree_gist opclasses not-default" part:
I hacked up DefineOpClass() to ignore the DEFAULT specification if
the opclass being created has the right name and input data type.
That certainly has a foul odor about it, but the alternatives seem
worse. We can't simply add a btree_gist update step to remove
the DEFAULT setting, because btree_gist--1.2.sql will already have
failed as a consequence of trying to create a default opclass when
there already is one. Modifying btree_gist--1.2.sql to remove the
DEFAULT markings might be safe, but it goes against our longstanding
rule that extension scripts don't change once shipped, and I'm not
entirely sure that there aren't bad consequences if we break that
rule. (I did go as far as to add a comment to it about what will
really happen.) Moreover, even if we were willing to risk changing
btree_gist--1.2.sql, that's not enough: pg_upgrade would still fail,
because it dumps extensions by content, and what it will see in the
old installation is btree_gist opclasses that are marked default.
So hacking up DefineOpClass() can solve both the
normal-extension-install case and the pg_upgrade case for not a lot
of code, and I'm not seeing another way that's better.

There are a couple of loose ends still to be dealt with. We need
to say something about this in btree-gist.sgml, but I've not
attempted to write that text yet. Also, I expect that
cross-version-upgrade testing will spit up on the inet/cidr indexes
created by btree_gist's regression tests. There's probably
nothing that can be done about the latter except to teach
AdjustUpgrade.pm to drop those indexes from the old installation.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/201010112055.o9BKtZf7011251%40wwwmaster.postgresql.org
[2] https://www.postgresql.org/message-id/flat/7891efc1-8378-2cf2-617b-4143848ec895%40proxel.se
[3] https://www.postgresql.org/message-id/flat/19000-2525470d200672ab%40postgresql.org

Attachment Content-Type Size
v1-0001-Mark-GiST-network_ops-opcdefault-and-btree_gist-s.patch text/x-diff 11.2 KB

Browse pgsql-hackers by date

  From Date Subject
Previous Message Sami Imseih 2025-08-01 18:15:24 Re: Improve LWLock tranche name visibility across backends