Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adé <ade(dot)hey(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Date: 2020-04-02 23:18:29
Message-ID: 7653.1585869509@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?B?QWTDqQ==?= <ade(dot)hey(at)gmail(dot)com> writes:
>> It seems like what you're actually trying
>> to accomplish is to ensure that entryLoadMoreItems's "stepright" path
>> is taken, instead of re-descending the index from the root.

> What I was primarily trying to do is make sure that when entryLoadMoreItems
> is called, it loads new items that it didn't load previously, which would
> occur in special cases. But the solution to that goal does result in the
> "stepright" path being used.

Oh, hm, now I see what you mean: as things stand, it's likely to
repeatedly (and very expensively) reload the same page we were
already on, until the random dropItem() test finally accepts some
item from that page. Ick.

I think though that the fix can be a bit less messy than you have here,
because advancePast is just a local variable in entryGetItem, so we
can overwrite it without any problem. So what I want to do is just
update it to equal entry->curItem before looping around. But shoving
that assignment into the while-condition was too ugly for my taste
(and no, I didn't like your assignment there either). So I ended up
refactoring the do-loop into a for-loop with internal break conditions,
as attached.

I also made the posting-list case handle reduction in the same way,
and for good measure changed the bitmap-result case to look similar,
which caused me to notice that it had a bug too: the "continue" case
within that loop failed to reset gotitem = false as it should,
if we'd looped around after rejecting an item due to reduceResult.
As far as I can see, that would lead to returning the previously-
rejected curItem value, which isn't fatal but it's still wrong.
So I just got rid of the gotitem variable altogether; it really
wasn't helping with either clarity or correctness.

This patch also adds a couple of test cases so that we have more
code coverage in this area. The overall coverage of ginget.c
is still mighty lame, but at least we're going through some of
these lines that we weren't before.

I'm inclined to back-patch this. Given how fuzzy the definition
of gin_fuzzy_search_limit is, it seems unlikely that anyone would
be depending on the current buggy behavior.

regards, tom lane

Attachment Content-Type Size
gin-fuzzy-search-limit-fix-2.patch text/x-diff 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-02 23:30:20 Re: Datum values consistency within one query
Previous Message Paul Ramsey 2020-04-02 22:48:28 Datum values consistency within one query