Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adé <ade(dot)hey(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Date: 2020-03-10 23:16:18
Message-ID: 30491.1583882178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?B?QWTDqQ==?= <ade(dot)hey(at)gmail(dot)com> writes:
> Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
> has a relatively low setting.
> ...
> Attached is SQL to test and observe this issue and also attached is a patch
> I want to eventually submit to a commitfest.

I took a brief look at this. It seems like what you're actually trying
to accomplish is to ensure that entryLoadMoreItems's "stepright" path
is taken, instead of re-descending the index from the root. Okay,
I can see why that'd be a big win, but why are you tying it to the
dropItem behavior? It should apply any time we're iterating this loop
more than once. IOW, it seems to me like the logic about when to step
right is just kinda broken, and this is a band-aid rather than a full fix.
The performance hit is worse for fuzzy-search mode because it will
iterate the loop more (relative to the amount of work done elsewhere),
but there's still a potential for wasted work.

Actually, a look at the code coverage report shows that the
not-step-right path is never taken at all in our standard regression
tests. Maybe that just says bad things about the tests' coverage, but
now I'm wondering whether we could just flush that code path altogether,
and assume that we should always step right at this point.

[ cc'ing heikki and alexander, who seem to have originated that code
at 626a120656a75bf4fe64b1d0d83c23cb38d3771a. The commit message says
it saves a lot of I/O, but I'm wondering if this report disproves that.
In any case the lack of test coverage is pretty damning. ]

While we're here, what do you think about the comment in the other
code branch just above:

/* XXX: shouldn't we apply the fuzzy search limit here? */

I'm personally inclined to suspect that the fuzzy-search business has
got lots of bugs, which haven't been noticed because (a) it's so squishily
defined that one can hardly tell whether a given result is buggy or not,
and (b) nearly nobody uses it anyway (possibly not unrelated to (a)).
As somebody who evidently is using it, you're probably more motivated
to track down bugs than the rest of us.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message movead.li@highgo.ca 2020-03-11 01:36:29 Re: Re: Asynchronous Append on postgres_fdw nodes.
Previous Message David Rowley 2020-03-10 23:05:02 Re: [PATCH] Erase the distinctClause if the result is unique by definition