Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
Date: 2019-01-14 22:37:23
Message-ID: CAH2-WzmAeq2b5WkLUcS5YobGft61bhc9SCmPaFTinnUDZAEL6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 14, 2019 at 1:46 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I would have said that the assumption is that a fixed source tuple
> will generate identical index entries. The problem with that is that
> my idea of what constitutes a fixed input now seems to have been
> faulty. I didn't think that the executor could mutate TOAST state in a
> way that made this kind of inconsistency possible.

The source tuple (by which I mean the mgd.bib_refs heap tuple) is a
HEAP_HASEXTERNAL tuple. If I update it to make a particularly long
text field NULL (UPDATE mgd.bib_refs SET abstract = NULL), and then
"INSERT INTO bug SELECT * FROM mgd.bib_refs", amcheck stops
complaining about the index on "bug.title" is missing. Even though the
"abstract" field has nothing to do with the index.

The source of the inconsistency here must be within
heap_prepare_insert() -- the external datum handling:

/*
* If the new tuple is too big for storage or contains already toasted
* out-of-line attributes from some other relation, invoke the toaster.
*/
if (relation->rd_rel->relkind != RELKIND_RELATION &&
relation->rd_rel->relkind != RELKIND_MATVIEW)
{
/* toast table entries should never be recursively toasted */
Assert(!HeapTupleHasExternal(tup));
return tup;
}
else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
return toast_insert_or_update(relation, tup, NULL, options);
else
return tup;

Even leaving that aside, I really should have spotted that
TOAST_TUPLE_THRESHOLD is a different thing to TOAST_INDEX_TARGET. The
two things are always controlled independently. Mea culpa.

The fix here must be to normalize index tuples that are compressed
within amcheck, both during initial fingerprinting, and during
subsequent probes of the Bloom filter in bt_tuple_present_callback().

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2019-01-14 23:00:35 Re: explain plans with information about (modified) gucs
Previous Message Andres Freund 2019-01-14 22:34:09 Re: [HACKERS] Surjective functional indexes