Re: Reducing relation locking overhead

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing relation locking overhead
Date: 2005-12-11 18:33:57
Message-ID: 1134326038.3567.50.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, L, 2005-12-10 kell 21:07, kirjutas Tom Lane:
> Hannu Krosing <hannu(at)skype(dot)net> writes:
> > How do you plan to determine "any rows not already present in the index"
> > without explicitly remembering the start and end snapshots of existing
> > CREATE INDEX (SNAP1 and SNAP2 in my proposal)?
>
> I was thinking in terms of actually looking into the index to see if the
> particular TID is present or not. You could use snapshots to optimize
> this by avoiding index probes for tuples that must be present, which
> hopefully will be most of 'em. Also you need a snapshot to detect
> tuples that are new enough that they certainly will be indexed by their
> inserting transaction, so that you don't have a race condition between
> an active inserter and the REINDEX. (I think this is possible but maybe
> I missed something.)

So would it be possible to do the whole indexing whil another long
transaction runs, possibly in SERIALIZABLE isolation level, inserting
tuples all the way.

> That leaves you looking at just the tuples
> inserted by transactions that might or might not have known about the
> index. So yeah, you do need SNAP1 and SNAP2 but they're being used in
> a different way than the original proposal.

By taking snaps at points with no concurrrent transactions I was hoping
to make things (both concept and code) simpler by reducing SNAP1 and
SNAP2 to mere transaction ids and making sure that these and only these
tuples inserted by transactions which fall between them need to be added
to the index.

If it does not actually make things clearer then it may not be a good
idea to try to fit them between transactions.

Otoh, we still do need to wait - possibly a long time - for "3. Attempt
to acquire ShareLock". So why not wait also at start in case that would
make things cleares/simpler ?

> > In the last round of discussion you pointed out that index itself can't
> > be effectively used for this in case there are lots of equal index keys.
>
> True, but if you can avoid going to the index at all for the majority of
> the tuples, I think this is tolerable.

And it seems that we could store same-valued index leafs in TID order,
at least during concurrent create index/reindex, making the index lookup
cheaper.

To me, it seems a good idea to store similar values in index always in
TID order, as this makes an index scan do the right thing (scan in heap
order), at least for key=const case, even without intermediate bitmap
index. Or are there downsides to this in general case ?

> In any case the design idea here
> seems to be "we don't care how long REINDEX takes as long as it's not
> blocking anyone".

Yes, thats the general idea.

Within reason of course, for example making a seqscan over the index for
each and every tuple inserted during building the first index would
probably still be too slow :)

> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2005-12-11 19:26:18 Re: [HACKERS] Please Help: PostgreSQL Query Optimizer
Previous Message Tom Lane 2005-12-11 17:44:15 Re: Something I don't understand with the use of schemas