"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On 8/30/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't think that works --- what if the last tuple in the chain isn't
>> committed good yet? If its inserter ultimately rolls back, you've
>> indexed the wrong value.
> I am confused. How could we get ShareLock on the relation while
> there is some open transaction which has inserted a tuple in the
> table ? (Of course, I am not considering the system tables here)
Not if someone else releases lock before committing. Which I remind you
is a programming technique we use quite a lot with respect to the system
catalogs. I'm not prepared to guarantee that there is no code path in
core (much less in contrib or third-party addons) that doesn't do it on
a user table; even less that no such path will ever appear in future.
As things stand it's a pretty harmless error --- the worst consequence
I know of is that a VACUUM FULL starting right then might bleat and
refuse to do shrinking. What you propose is to turn the case into a
cause of silent index corruption.
A more robust solution would be to wait out the source transaction if
CREATE INDEX comes across an INSERT_IN_PROGRESS or DELETE_IN_PROGRESS
HOT tuple. Or you could throw an error, since it's not really likely to
happen (except maybe in a system catalog REINDEX operation). But the
way the patch approaches this at the moment is far too fragile IMHO.
If we approach it this way, we might also be able to jettison some of
the other complexity such as idxcreatexid.
>> Isn't the extra machination for C.I.C. just useless complication?
>> What's the point of avoiding creation of new broken HOT chains when
>> you still have to deal with existing ones?
> IMHO the extra step in C.I.C simplifies the index build.
If you make the change suggested above, I think you don't need to do
things differently in C.I.C.
>>> I also don't think I
>>> believe the reasoning for not indexing DELETE_IN_PROGRESS hot-updated
>>> tuples: what if the index completion commits, but the concurrent delete
>>> rolls back? Then you've got a valid tuple that's not in the index.
>> Since CREATE INDEX works with ShareLock on the relation, we can
>> safely assume that we can't find any DELETE_IN_PROGRESS tuples except
>> those deleted by our own transaction. The only exception is system
>> relation, but we don't do HOT updates on system relation.
Same issue as above: this makes correctness utterly dependent on nobody
releasing locks early.
> OK. So if I get you correctly, you are suggesting to acquire cleanup lock.
> If we don't get that, we don't to any maintenance work. Otherwise, we prune
> and repair fragmentation in one go.
Yeah, that's what I'm thinking; then there's no need to track a separate
page state where we've pruned but not defragmented.
regards, tom lane
In response to
pgsql-patches by date
|Next:||From: Andrew Dunstan||Date: 2007-08-30 18:48:02|
|Subject: Re: enum types and binary queries|
|Previous:||From: Bruce Momjian||Date: 2007-08-30 17:28:49|
|Subject: Re: pg_dump --no-tablespaces patch|