Re: GIN improvements part2: fast scan

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-02 10:45:37
Message-ID: 52EE21D1.2080804@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/30/2014 01:53 AM, Tomas Vondra wrote:
> (3) A file with explain plans for 4 queries suffering ~2x slowdown,
> and explain plans with 9.4 master and Heikki's patches is available
> here:
>
> http://www.fuzzy.cz/tmp/gin/queries.txt
>
> All the queries have 6 common words, and the explain plans look
> just fine to me - exactly like the plans for other queries.
>
> Two things now caught my eye. First some of these queries actually
> have words repeated - either exactly like "database & database" or
> in negated form like "!anything & anything". Second, while
> generating the queries, I use "dumb" frequency, where only exact
> matches count. I.e. "write != written" etc. But the actual number
> of hits may be much higher - for example "write" matches exactly
> just 5% documents, but using @@ it matches more than 20%.
>
> I don't know if that's the actual cause though.

Ok, here's another variant of these patches. Compared to git master, it
does three things:

1. It adds the concept of ternary consistent function internally, but no
catalog changes. It's implemented by calling the regular boolean
consistent function "both ways".

2. Use a binary heap to get the "next" item from the entries in a scan.
I'm pretty sure this makes sense, because arguably it makes the code
more readable, and reduces the number of item pointer comparisons
significantly for queries with a lot of entries.

3. Only perform the pre-consistent check to try skipping entries, if we
don't already have the next item from the entry loaded in the array.
This is a tradeoff, you will lose some of the performance gain you might
get from pre-consistent checks, but it also limits the performance loss
you might get from doing useless pre-consistent checks.

So taken together, I would expect this patch to make some of the
performance gains less impressive, but also limit the loss we saw with
some of the other patches.

Tomas, could you run your test suite with this patch, please?

- Heikki

Attachment Content-Type Size
gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch text/x-diff 26.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2014-02-02 10:53:51 Re: Triggers on foreign tables
Previous Message Julien Rouhaud 2014-02-02 09:50:35 Re: [PATCH] pg_sleep(interval)