Quick Links

Re: BUG #19031: pg_trgm infinite loop on certain cases

From:	Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	washwithcare(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: BUG #19031: pg_trgm infinite loop on certain cases
Date:	2025-08-27 08:33:03
Message-ID:	CAE7r3MK19VyQU_K2EYasMaUBNBd9n82qAmTfvq+vGFr320ZM=A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Tue, Aug 26, 2025 at 6:16 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> writes:
> > On Tue, Aug 26, 2025 at 3:54 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> However, I don't totally understand *why* it fixes the test case.
> >> Especially not after I noted that there's already a test case in
> >> pg_trgm that exercises exactly this situation:
> >>
> >> select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
> >>
> >> If you put an Assert into ginNewScanKey that the first scan key
> >> isn't excludeOnly (instead of the re-sort), it fails on that query.
> >> So why do we not see an infinite loop for that test case? I don't
> >> really understand the GIN code well enough to figure out what is
> >> the difference.
>
> > I debug a little bit and it looks like the reason there's no infinite
> > loop in your example is because it returns MAYBE for the first
> > 'excludeOnly' key in:
> > keyGetItem()
> > ...
> > res = key->triConsistentFn(key);
>
> Ah-hah! The thing I'd overlooked is that that regression test query
> uses a different operator: LIKE not %>. So even though the
> GinScanKey looks pretty similar, the strategy is different, leading
> gin_trgm_triconsistent to return GIN_MAYBE not GIN_FALSE when nkeys=0.
> So now I can reproduce the failure with the regression tests' table:
>
> select count(*) from test_trgm where t %> '' and t %> '%qwerty%';
>

Great!

> > I don't have an opinion on whether it's good or not to move
> > all the 'excludeOnly' keys to the end, but it seems that simply not
> > having an "excludeOnly" key as the first key is enough to fix the bug.
> > Maybe it's enough to just swap any normal key with the first one, if
> > it's "excludeOnly"?
>
> Yeah, that would be enough to get us out of this particular example.
> But I think the lesson here is that there are under-documented
> dependencies on the ordering of GinScanKeys, and I want the fix to
> make that ordering more predictable not less so. For example, after
> seeing this I have little confidence that GIN wouldn't have issues
> with an excludeOnly key that precedes the first normal key for its
> index attribute, even when there are other keys for other attributes
> appearing ahead of them in the scankey array. So I'd rather that the
> fix be based on a consistent pattern like "put excludeOnly keys after
> not-excludeOnly keys", not "let's swap the first key with some
> randomly-chosen other key".
>

Good point, thanks for the explanation. I forgot that there can be
many attributes. And I agree, the more determinism in the system, the
easier it is to work with it and the less room for bugs. OTOH it seems
from the performance POV we want to have the stricter keys to be the
first so we do less work and fail fast on the first keys. It looks
like these two rules (excludeOnly keys LAST and more restrictive keys
FIRST) are kind of in conflict with each other. I tried to do some
experiments and it's seems GIN quite sensitive to it, at least in this
artificial example:

(bug repro table is used here)

-- excludeOnly key is the middle
explain analyse
select *
from simple_case
where ('two' <% name)
and (',' <% name)
and ('and' <% name);

-- Execution Time: 406.718 ms

-- excludeOnly key in the end
explain analyse
select *
from simple_case
where ('two' <% name)
and ('and' <% name)
and (',' <% name);

-- Execution Time: 706.655 ms

With applying patch both queries show the same time (second one). So
currently the user can tune the query by defining more restrictive
keys first. With the proposed fix it looks like users will have less
freedom here. But it's only about excludeOnly keys. Don't know if we
need to worry about it?

Best regards,
Arseniy Mukhin

In response to

Re: BUG #19031: pg_trgm infinite loop on certain cases at 2025-08-26 15:16:46 from Tom Lane

Responses

Re: BUG #19031: pg_trgm infinite loop on certain cases at 2025-08-27 14:32:26 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Wojciech Szenajch	2025-08-27 09:37:20	Re: BUG #19032: In restore_command %f parameter does not support WAL partial files.
Previous Message	Thadeus Anand	2025-08-27 07:37:23	Re: [CAUTION: SUSPECT SENDER] RE: [CAUTION: SUSPECT SENDER] RE: [CAUTION: SUSPECT SENDER] RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine