Re: GIN fast insert

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN fast insert
Date: 2009-02-23 09:56:05
Message-ID: 1235382965.16176.72.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Thu, 2009-02-19 at 22:43 -0500, Robert Haas wrote:
> > I don't see a problems here, because indexes in postgres don't
> depend on any
> > transaction's ids or modes as heap depends. WAL-logger works without
> that
> > knowledge too. May be I missed something here or don't understand.
> >
> > Although heap's pages could be changed by setting commit-status bits
> on
> > tuple even in read-only transaction, but that changes are not
> WAL-logged.
> > That is correct for index's page too.
>
> It would be helpful if Heikki or Simon could jump in here, but my
> understanding is that cleaning up the pending list is a read-write
> operation. I don't think we can do that on a hot standby server.

>From reading the docs with the patch the pending list is merged into the
main index when a VACUUM is performed. I (think I) can see that
additions to the pending list are WAL logged, so that will work in Hot
Standby. I also see ginEntryInsert() calls during the move from pending
list to main index which means that is WAL logged also. AFAICS this
*could* work during Hot Standby mode.
Best check here http://wiki.postgresql.org/wiki/Hot_Standby#Usage
rather than attempt to read the patch.
Teodor, can you confirm
* we WAL log the insert into the pending list
* we WAL log the move from the pending list to the main index
* that we maintain the pending list correctly during redo so that it can
be accessed by index scans

The main thing with Hot Standby is that we can't do any writes. So a
pending list cannot change solely because of a gingettuple call on the
*standby*. But that's easy to disable. If all the inserts happened on
the primary node and all the reads happened on the standby, then pending
list would never be cleaned up if the cleanup is triggered only by read.
I would suggest that we trigger cleanup by read at threshold size X and
trigger cleanup by insert at threshold size 5X. That avoids the strange
case mentioned, but generally ensures only reads trigger cleanup. (But
why do we want that??)

I found many parts of the patch and docs quite confusing because of the
way things are named. For me, this is a deferred or delayed insert
technique to allow batching. I would prefer if everything used one
description, rather than "fast", "pending", "delayed" etc.

Personally, I see ginInsertCleanup() as a scheduled task unrelated to
vacuum. Making the deferred tasks happen at vacuum time is just a
convenient way of having a background task occur regularly. That's OK
for now, but I would like to be able to request a background task
without having to hook into AV.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-02-23 10:03:22 pgsql: Add quotes to message
Previous Message Heikki Linnakangas 2009-02-23 09:31:52 Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.