GIN fast insert

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: GIN fast insert
Date: 2009-02-11 02:59:54
Message-ID: 603c8f070902101859j91fb78eg7e0228afe8f2fd36@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis asked me if I'd be willing to do a review of the GIN fast
insert patch about two weeks ago, but I haven't actually had a chance
to read through it in detail until tonight. I can't say I really know
anything about GIN (though I did take this opportunity to RTM), so
apologies in advance if my comments here are totally off base.

My basic impression of this code is that it's trying unnaturally hard
to avoid creating a lossy TIDBitmap, which seems pretty odd,
considering that the whole point of TIDBitmap is, AFAICS, to degrade
to lossy mode when the alternative is eating too much memory. I'm
particularly dismayed by this hunk:

+ if ( ntids == NULL && tbm && tbm_has_lossy(tbm) )
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("not enough memory to
store result of pending list or VACUUM table" ),
+ errhint("Increase the
\"work_mem\" parameter.")));

The definition of work_mem is the amount of memory that a hash table
can use before spilling to disk, NOT the amount of memory a hash table
can consume before arbitrarily failing. It's intended to be a soft
limit which people can tune to get the best performance out of their
system, not a hard limit that interrupts work. Using the limit this
way will encourage people to set work_mem too high so that things
don't crash, but that in turn will potentially lead to other bad
behavior (like swapping). I see that there's already one other place
in the GIN code that uses work_mem this way, but I don't think that's
a good reason to add another one - I don't see any other place in the
system that behaves this way, outside of GIN.

I think this is related to the problems with gincostestimate() that
Tom Lane was complaining about here:

http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us

I am not 100% sure I'm understanding this correctly, but I think the
reason why gincostestimate() is so desperate to avoid index scans when
the pending list is long is because it knows that scanFastInsert()
will blow up if an index scan is actually attempted because of the
aforementioned TIDBitmap problem. This seems unacceptably fragile.
Faster insert performance is nice, but not if it means that my indices
suddenly start switching themselves off (which is bad) or in the
presence of cached plans, crashing (which is worse).

I think this code needs to be somehow rewritten to make things degrade
gracefully when the pending list is long - I'm not sure what the best
way to do that is. Inventing a new data structure to store TIDs that
is never lossy seems like it might work, but you'd have to think about
what to do if it got too big. I think it's probably best to just go
ahead and let it get arbitrarily long (since you have no other option
besides crashing) and leave work_mem out of it. Instead, put a
reloption it that controls the maximum length of the pending list.
This is easy to tune: if you make it bigger, insert performance
improves, but queries eat more memory. If you make it smaller, insert
performance gets worse, but you bound the memory that queries use more
tightly.

The other problem with using work_mem is that it can vary between
sessions, so one session happily stuffs a lot of data into the pending
list and then another session can't scan the index because it has a
lower work_mem setting. Using a reloption avoids that problem by
making the whole thing symmetric, plus it gives you fine-grained
control over the behavior on an index-by-index basis.

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-02-11 03:38:47 Re: GIN fast insert
Previous Message Jonah H. Harris 2009-02-11 02:06:28 Re: Optimization rules for semi and anti joins