Re: BUG #19031: pg_trgm infinite loop on certain cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: washwithcare(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: BUG #19031: pg_trgm infinite loop on certain cases
Date: 2025-08-26 00:54:23
Message-ID: 969700.1756169663@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When querying against a column with a gin_trgm_ops index, using <% with a
> string without any trigrams followed by a string with trigrams causes what
> appears to be an infinite loop, and the query cannot be canceled, and the
> process must be restarted in order to kill the long running query.

Thanks for the test case! AFAICT this is a bug in 4b754d6c1
which introduced "excludeOnly" GIN scan keys. There is a comment
in scanGetItem that says

/*
* ginNewScanKey() should never mark the first key as
* excludeOnly.
*/

However, if you look at ginNewScanKey, it's totally not concerned
about avoiding that. In this test case, the first scan key is marked
excludeOnly, and that sends scanGetItem into what seems an infinite
loop.

After reading the comments in that commit, I think what we actually
want is to require excludeOnly scan keys to appear last. The 0002
patch attached modifies ginNewScanKey to re-order the scan keys to
guarantee that, and it fixes this test case.

However, I don't totally understand *why* it fixes the test case.
Especially not after I noted that there's already a test case in
pg_trgm that exercises exactly this situation:

select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';

If you put an Assert into ginNewScanKey that the first scan key
isn't excludeOnly (instead of the re-sort), it fails on that query.
So why do we not see an infinite loop for that test case? I don't
really understand the GIN code well enough to figure out what is
the difference.

In the meantime, the 0001 patch attached moves the
CHECK_FOR_INTERRUPTS() call in gingetbitmap to be inside the loop in
scanGetItem, so that it's able to respond to a query cancel request in
this situation. I think we'd better do that even after fixing the
present bug.

regards, tom lane

Attachment Content-Type Size
0001-move-CFI-inside-scanGetItem.patch text/x-diff 589 bytes
0002-put-excludeOnly-keys-last.patch text/x-diff 2.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thadeus Anand 2025-08-26 08:15:24 Re: [CAUTION: SUSPECT SENDER] RE: [CAUTION: SUSPECT SENDER] RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
Previous Message PG Bug reporting form 2025-08-25 21:27:04 BUG #19031: pg_trgm infinite loop on certain cases