intarray GiST index gets wrong answers for '{}' <@ anything

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: intarray GiST index gets wrong answers for '{}' <@ anything
Date: 2019-08-06 17:55:41
Message-ID: 458.1565114141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While looking at the pending patch for faster GIN index searches
on no-key queries, I was motivated to improve contrib/intarray's
regression test to exercise the GIN_SEARCH_MODE_ALL case, because
it didn't. And then I thought well, let's try to bring the code
coverage of _int_gin.c up to something respectable, which led me
to the regression test additions shown in the attached. And I
was astonished to observe that the GiST index cases mostly got
the wrong answer for the <@ query. Sometimes they got the right
answer, but mostly not. After some digging I saw that the problem
was that there are a number of empty arrays ('{}') in the data,
and those should surely all match the WHERE a <@ '{73,23,20}'
condition, but the GiST opclasses were not reliably finding them.

The reason appears to be that the condition for descending through a
non-leaf index key for the RTContainedBy case is incorrectly optimistic:
it supposes that we only need to descend into subtrees whose union key
overlaps the query array. But this does not guarantee to find subtrees
that contain empty-array entries. Worse, such entries could be anywhere
in the tree, and because of the way that the insertion penalty is
calculated, they probably are. (We will compute a zero penalty to add
an empty array item to any subtree.) The reason it sometimes works
seems to be that GiST randomizes its insertion decisions when there are
equal penalties (cf gistchoose()), and sometimes by luck it puts all
of the empty-array entries into subtrees that the existing rule will
search.

So as far as I can see, we have little choice but to lobotomize the
RTContainedBy case and force a whole-index search. This applies to
both the gist__int_ops and gist__intbig_ops opclasses. This is
pretty awful for any applications that are depending on such queries
to be fast, but it's hard to argue with "it gets the wrong answer,
and not even reproducibly so".

In the future we might think about removing <@ from these opclasses,
or making a non-backward-compatible change to segregate empty arrays
from everything else in the index. But neither answer seems very
back-patchable, and I'm not really sure I want to put so much work
into a second-class-citizen contrib module anyway.

Comments?

regards, tom lane

Attachment Content-Type Size
fix-intarray-for-empty-array-contained-in-queries.patch text/x-diff 7.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-08-06 18:03:08 Re: [PATCH] Atomic pgrename on Windows
Previous Message Bruce Momjian 2019-08-06 17:55:38 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)