Re: GIN improvements part2: fast scan

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN improvements part2: fast scan
Date: 2014-01-26 18:20:44
Message-ID: 52E551FC.7080505@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.1.2014 17:14, Heikki Linnakangas wrote:
>
> I would actually expect it to be fairly effective for that query, so
> that's a bit surprising. I added counters to see where the calls are
> coming from, and it seems that about 80% of the calls are actually
> coming from this little the feature I explained earlier:
>
>> In addition to that, I'm using the ternary consistent function to check
>> if minItem is a match, even if we haven't loaded all the entries yet.
>> That's less important, but I think for something like "rare1 | (rare2 &
>> frequent)" it might be useful. It would allow us to skip fetching
>> 'frequent', when we already know that 'rare1' matches for the current
>> item. I'm not sure if that's worth the cycles, but it seemed like an
>> obvious thing to do, now that we have the ternary consistent function.
>
> So, that clearly isn't worth the cycles :-). At least not with an
> expensive consistent function; it might be worthwhile if we pre-build
> the truth-table, or cache the results of the consistent function.
>
> Attached is a quick patch to remove that, on top of all the other
> patches, if you want to test the effect.

Indeed, the patch significantly improved the performance. The total
runtime is almost exactly the same as on 9.3 (~22 seconds for 1000
queries). The timing chart (patched vs. 9.3) is attached.

A table with number of queries with duration ratio below some threshold
looks like this:

threshold | count | percentage
-------------------------------------
0.5 | 3 | 0.3%
0.75 | 45 | 4.5%
0.9 | 224 | 22.4%
1.0 | 667 | 66.7%
1.05 | 950 | 95.0%
1.1 | 992 | 99.2%

A ratio is just a measure of how much time it took compared to 9.3

ratio = (duration on patched HEAD) / (duration on 9.3)

The table is cumulative, e.g. values in the 0.9 row mean that for 224
queries the duration with the patches was below 90% of the duration on 9.3.

IMHO the table suggests with the last patch we're fine - majority of
queries (~66%) is faster than on 9.3, and the tail is very short. There
are just 2 queries that took more than 15% longer, compared to 9.3. And
we're talking about 20ms vs. 30ms, so chances are this is just a random
noise.

So IMHO we can go ahead, and maybe tune this a bit more in the future.

regards
Tomas

Attachment Content-Type Size
image/png 18.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-26 18:22:47 Re: [patch] Client-only installation on Windows
Previous Message Andrew Dunstan 2014-01-26 18:03:57 Re: running make check with only specified tests