Re: GIN fast insert

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN fast insert
Date: 2009-02-11 03:38:47
Message-ID: 126.1234323527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think this is related to the problems with gincostestimate() that
> Tom Lane was complaining about here:
> http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us

> I am not 100% sure I'm understanding this correctly, but I think the
> reason why gincostestimate() is so desperate to avoid index scans when
> the pending list is long is because it knows that scanFastInsert()
> will blow up if an index scan is actually attempted because of the
> aforementioned TIDBitmap problem. This seems unacceptably fragile.

Yipes. If that's really the reason then I agree, it's a nonstarter.

> I think this code needs to be somehow rewritten to make things degrade
> gracefully when the pending list is long - I'm not sure what the best
> way to do that is. Inventing a new data structure to store TIDs that
> is never lossy seems like it might work, but you'd have to think about
> what to do if it got too big.

What would be wrong with letting it degrade to lossy? I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions.

It strikes me that part of the issue here is that the behavior of this
code is much better adapted to the bitmap-scan API than the traditional
indexscan API. Since GIN doesn't support ordered scan anyway, I wonder
whether it wouldn't be more sensible to simply allow it to not offer
the traditional API. It should be easy to make the planner ignore plain
indexscan plans for an AM that didn't support them.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lawrence, Ramon 2009-02-11 03:51:13 Re: The testing of multi-batch hash joins with skewed data sets patch
Previous Message Robert Haas 2009-02-11 02:59:54 GIN fast insert