[PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

From: Adé <ade(dot)hey(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Date: 2020-02-02 18:06:01
Message-ID: CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
has a relatively low setting.

What do I mean by "relatively low"? An example would be when a table with a
GIN index has many millions of rows and a particular keyword search has
1,000,000 possible results because the keyword is very common (or it's just
that the table is so supremely large that even a somewhat common keyword
appears enough to return one million results). However, you only want to
return around 100 random results from that one million, so you set
gin_fuzzy_search_limit to 100. That limit is relatively low when you look
at the ratio of the limit value to the possible results: 100 / 1,000,000 =
0.0001. You'll find the query is very slow for such a low ratio. It isn't
so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a
total of 10,000 possible results (resulting in a higher ratio of 0.1).

This would explain why in the documentation it is said that "From
experience, values in the thousands (e.g., 5000 — 20000) work well". It's
not so common to have queries that return large enough result sets such
that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in
low ratios and so result in the performance issue I've observed (these
gin_fuzzy_search_limit values have relatively high ratios between 0.005 and
0.02 if you have 1,000,000 results for a keyword search). However, if you
desire a lower gin_fuzzy_search_limit such as 100, while also having a
relatively larger table, you'll find this slowness issue.

I discussed this issue more and the reason for it in my original bug
report:
https://www.postgresql.org/message-id/16220-1a0a4f0cb67cafdc@postgresql.org

Attached is SQL to test and observe this issue and also attached is a patch
I want to eventually submit to a commitfest.

Best regards,
Adé

Attachment Content-Type Size
gin_fuzzy_search_limit_test.sql application/octet-stream 1.9 KB
ginget.patch application/octet-stream 1.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-02-02 18:41:34 Re: PATCH: add support for IN and @> in functional-dependency statistics use
Previous Message Daniel Gustafsson 2020-02-02 17:19:04 Re: TestLib condition for deleting temporary directories