From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | washwithcare(at)gmail(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Subject: | Re: BUG #19031: pg_trgm infinite loop on certain cases |
Date: | 2025-08-26 00:54:23 |
Message-ID: | 969700.1756169663@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When querying against a column with a gin_trgm_ops index, using <% with a
> string without any trigrams followed by a string with trigrams causes what
> appears to be an infinite loop, and the query cannot be canceled, and the
> process must be restarted in order to kill the long running query.
Thanks for the test case! AFAICT this is a bug in 4b754d6c1
which introduced "excludeOnly" GIN scan keys. There is a comment
in scanGetItem that says
/*
* ginNewScanKey() should never mark the first key as
* excludeOnly.
*/
However, if you look at ginNewScanKey, it's totally not concerned
about avoiding that. In this test case, the first scan key is marked
excludeOnly, and that sends scanGetItem into what seems an infinite
loop.
After reading the comments in that commit, I think what we actually
want is to require excludeOnly scan keys to appear last. The 0002
patch attached modifies ginNewScanKey to re-order the scan keys to
guarantee that, and it fixes this test case.
However, I don't totally understand *why* it fixes the test case.
Especially not after I noted that there's already a test case in
pg_trgm that exercises exactly this situation:
select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
If you put an Assert into ginNewScanKey that the first scan key
isn't excludeOnly (instead of the re-sort), it fails on that query.
So why do we not see an infinite loop for that test case? I don't
really understand the GIN code well enough to figure out what is
the difference.
In the meantime, the 0001 patch attached moves the
CHECK_FOR_INTERRUPTS() call in gingetbitmap to be inside the loop in
scanGetItem, so that it's able to respond to a query cancel request in
this situation. I think we'd better do that even after fixing the
present bug.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
0001-move-CFI-inside-scanGetItem.patch | text/x-diff | 589 bytes |
0002-put-excludeOnly-keys-last.patch | text/x-diff | 2.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thadeus Anand | 2025-08-26 08:15:24 | Re: [CAUTION: SUSPECT SENDER] RE: [CAUTION: SUSPECT SENDER] RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine |
Previous Message | PG Bug reporting form | 2025-08-25 21:27:04 | BUG #19031: pg_trgm infinite loop on certain cases |