Isn't gist_page_items broken for opclasses with compress methods?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Isn't gist_page_items broken for opclasses with compress methods?
Date: 2022-02-17 18:55:32
Message-ID: 70c9e416-e765-db10-b288-acc992678c20@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been looking at a report of a crash an in-place upgrade [1],
related to a GiST index, and I tried to use the new GiST support added
to pageinspect in 756ab29124 (so it's in 14). My knowledge of GiST is
somewhat limited, but after struggling with it for a while I wonder if
it's actually correct for GiST indexes with "compress" methods.

Consider a simple example with ltree:

CREATE TABLE test (a ltree);
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');

SELECT * FROM test;

a
------------------------------------
Top.Collections.Pictures.Astronomy
(1 row)

CREATE INDEX ON test USING gist (a);

so far everything seems fine, but let's use the pageinspect stuff.

SELECT * FROM gist_page_items(get_raw_page('test_a_idx', 0),
'test_a_idx');

WARNING: problem in alloc set ExprContext: detected write past chunk
end in block 0x1b1b110, chunk 0x1b1d2c0
WARNING: problem in alloc set ExprContext: bad single-chunk 0x1b1d358
in block 0x1b1b110
WARNING: problem in alloc set ExprContext: found inconsistent memory
block 0x1b1b110

itemoffset | ctid | itemlen | dead | keys
------------+-------+---------+------+--------
1 | (0,1) | 96 | f | (a)=()
(1 row)

Well, that's not great, right? I know pageinspect may misbehave if you
supply incorrect info or for corrupted pages, but that's not what's
happening here - the heap/index are fine, we have the right descriptor
and all of that. Yet we corrupt memory, and generate bogus result (I
mean, the keys clear are not empty).

The simple truth is this simply assumes the GiST index simply stores the
input data as is, and we can happily call output on the procedure, so we
read stuff from the index and call ltree_out() on it. But that's
nonsense, because what the index actually stores is ltree_gist, which is
essentially "ltree" with extra 8B header. Which (of course) utterly
confuses ltree_out, producing nonsense result.

The comment before BuildIndexValueDescription(), which is used to print
the keys, says this:

* The passed-in values/nulls arrays are the "raw" input to the
* index AM, e.g. results of FormIndexDatum --- this is not
* necessarily what is stored in the index, but it's what the
* user perceives to be stored.

Which I think simply means it's fine to call BuildIndexValueDescription
on the pass to index AMs, not on the stuff we read from the index.
Because it's up to the AM to transform it arbitrarily.

In the regression tests it seemingly works, but I believe that's merely
due to using a Point. gist/point_ops defines gist_point_compress, but
that merely builds BOX, which is two points next to each other. So we
simply print the first one ...

I've tried various other GiST opclasses (e.g. btree_git) and all of that
prints just bogus values.

Not sure what to do about this. BuildIndexValueDescription seems like
the wrong place to deal with this, and it was intended for entirely
different purpose (essentially just error messages).

So perhaps pageinspect adding a variant that knows how to decompress the
value? Or just not printing the keys, because the current state seems a
bit useless and confusing.

regards

https://www.postgresql.org/message-id/17406-71e02820ae79bb40%40postgresql.org

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2022-02-17 19:02:51 Re: support for CREATE MODULE
Previous Message Finnerty, Jim 2022-02-17 18:55:06 Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?