Re: GIN fast insert

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN fast insert
Date: 2009-02-24 21:35:30
Message-ID: 603c8f070902241335i575269a8ydccf01043644250f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 24, 2009 at 2:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On the other hand, Teodor showed a typical use case and a very
>> substantial performance gain:
>
> Yeah.  Whatever we do here is a tradeoff (and whether Robert likes it
> or not, reliability and code maintainability weigh heavily in the
> tradeoff).

I have no problem with reliability or code maintainability and I'm not
sure what I said that would give that impression. If the consensus of
the group is that the performance loss from dropping index scans is
not important, then I'm fine with that, especially if that consensus
is reached in the context of an educated knowledge of what that
performance loss is likely to be. To me, a 2x slowdown on two-table
anti-join seems pretty bad, but I just work here. Perhaps nobody else
thinks that a semi-join or anti-join against a GIN index is a
plausible use case (like, find all of the words from the following
list that do not appear in any document)?

If everyone agrees that we don't care about that case (or about
ORDER-BY-without-LIMIT, which is certainly less compelling), then go
ahead and remove it. I have no horse in this race other than having
been asked to review the patch, which I did.

On the other hand, if a significant number of people think that it
might be a bad idea to make that case significantly worse, then some
redesign work is called for, and that may mean the patch needs to get
bumped.

My own opinion is that it is better to decide on the right design and
then figure out which release that design can go into than it is to
start by deciding this has to go into 8.4 and then figuring out what
can be done in that period of time. I don't think there is any
question that making GIN continue to support both index scans and
bitmap index scans will make the code more complex, but how bad will
it be? So far we've ruled out using the planner to prevent index
scans when the pending list is long (because it's not reliable) and
cleaning up the pending list during insert when needed (because it
won't work with Hot Standby). We haven't decided what WILL work,
apart from ripping out index scans altogether, so to some degree we're
comparing against an unknown.

>> I wonder how many people really use GIN with non-bitmap scans for some
>> benefit? And even if the benefit exists, does the planner have a way to
>> identify those cases reliably, or does it have to be done manually?
>
> A relevant point there is that most of the estimator functions for
> GIN-amenable operators are just smoke and mirrors; so if the planner
> is making a good choice between indexscan and bitmapscan at all, it's
> mostly luck.  This might get better someday, but not in 8.4.

Based on the limited testing I've done thus far, it appears to pick an
index scan for small numbers of rows and a bitmap index scan for
larger number of rows. Index scans will have lower startup costs
which can be valuable if you only need to scan part of the index (as
in the semi and anti join cases). I haven't done enough testing to
see if there is any benefit when scanning the whole index and only
returning a few tuples.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-02-24 21:52:38 Re: Synchronous replication & Hot standby patches
Previous Message Simon Riggs 2009-02-24 20:52:39 Re: Hot standby, recovery procs