Re: HOT patch - version 14

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: HOT patch - version 14
Date: 2007-08-30 16:04:26
Message-ID: 2e78013d0708300904g10920497k5037ff8e3aa6ba3d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On 8/30/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > You are right - a new index might mean that an existing HOT chain
> > is broken as far as the new index is concerned. The way we address
> > that is by indexing the root tuple of the chain, but the index key is
> > extracted from the last tuple in the chain. The index is marked
> "unusable"
> > for all those existing transactions which can potentially see any
> > intermediate tuples in the chain.
>
> I don't think that works --- what if the last tuple in the chain isn't
> committed good yet? If its inserter ultimately rolls back, you've
> indexed the wrong value.

I am confused. How could we get ShareLock on the relation while
there is some open transaction which has inserted a tuple in the
table ? (Of course, I am not considering the system tables here)
If the transaction which inserted the last tuple in the chain is already
aborted, we are not indexing that tuple (even if that tuple is at the
end). We would rather index the last committed-good tuple in the
chain. Running the tuples with HeapTupleSatisfiesVacuum guarantees
that. Isn't it ?

> Please see this document written by Greg Stark. He has nicely summarized
> > how CREATE INDEX and CREATE INDEX CONCURRENTLY works with HOT.
> > http://archives.postgresql.org/pgsql-patches/2007-07/msg00360.php
>
> Isn't the extra machination for C.I.C. just useless complication?
> What's the point of avoiding creation of new broken HOT chains when
> you still have to deal with existing ones?

IMHO the extra step in C.I.C simplifies the index build. The
transaction-waits
takes care of the existing broken chains and the early catalog entry for
the index helps us avoid creating new broken chains. I am not against
doing it a different way, but this is the best solution we could come up
when we worked on CIC.

>> I also don't think I
> >> believe the reasoning for not indexing DELETE_IN_PROGRESS hot-updated
> >> tuples: what if the index completion commits, but the concurrent delete
> >> rolls back? Then you've got a valid tuple that's not in the index.
>
> > Since CREATE INDEX works with ShareLock on the relation, we can
> > safely assume that we can't find any DELETE_IN_PROGRESS tuples except
> > those deleted by our own transaction. The only exception is system
> relation,
> > but we don't do HOT updates on system relation.
>
> That chain of reasoning is way too shaky for me. Essentially what
> you're saying is you'll promise not to corrupt non-system indexes.
> Nor am I really thrilled about having to disable HOT for system
> catalogs.

I am not against supporting HOT for system catalogs. But IMHO
its not a strict requirements because system catalogs are small, less
frequently updated and HOT adds little value to them. If we don't have
a generic solution which works for system and non-system tables,
thats the best we can get. We can start with non-system
tables and expand its scope later.

> The only reason to redefine MaxHeapTuplesPerPage to higher side is
> > because HOT allows us to accommodate more tuples per page because
> > of redirect-dead line pointers.
>
> No, it doesn't allow more tuples per page. It might mean there can be
> more line pointers than that on a page, but not more actual tuples.
> The question is whether there is any real use in allowing more line
> pointers than that --- the limit is already unrealistically high,
> since it assumes no data content in any of the tuples. If there is a
> rationale for it then you should invent a different constant
> MaxLinePointersPerPage or some such, but I really think there's no
> point.

I agree. I think the current limit on MaxHeapTuplesPerPage is sufficiently
large. May be we can keep it the same and tune it later if we have numbers
to prove its worthiness.

> Doubling the value is chosen as a balance between heap page
> > utilization, line pointer bloating and overhead for bitmap scans.
>
> I'd say it allows a ridiculous amount of line pointer bloat.

OK. Lets keep MaxHeapTuplesPerPage unchanged.

>> Even if it's safe, ISTM what you're mostly accomplishing there is to
> >> expend a lot of cycles while holding exclusive lock on the page, when
> >> there is good reason to think that you're blocking other people who are
> >> interested in using the page. Eliminating the separation between that
> >> and cleanup would also allow eliminating the separate "PD_FRAGMENTED"
> >> page state.
>
> > The reason we did it that way because repairing fragmentation seems
> > much more costly that pruning. Please note that we prune a single
> > chain during index fetch. Its only for heap-scans (and VACUUM) that
> > we try to prune all chains in the page. So unless we are doing
> heap-scan,
> > I am not sure if we are spending too much time holding the exclusive
> > lock. I agree we don't have any specific numbers to prove that though.
>
> If you don't have numbers proving that this extra complication has a
> benefit, I'd vote for simplifying it. The SnapshotAny case is going to
> bite other people besides you in future.

OK. So if I get you correctly, you are suggesting to acquire cleanup lock.
If we don't get that, we don't to any maintenance work. Otherwise, we prune
and repair fragmentation in one go.

A related question is: should we always prune all chains in the
page ? I guess if we are going to repair fragmentation, it would make
more sense to do that.

> Another reasoning behind separating these two steps is: pruning
> > requires exclusive lock whereas repairing fragmentation requires
> > cleanup lock.
>
> This is nonsense. Those are the same lock. If you have the former and
> not the latter, it just means that you *know* there is contention for
> the page. It seems to me that performing optional maintenance work in
> such a situation is completely wrong.
>
>
Oh yes, they are the same lock. The difference is the chances of getting
one is more than the other. But I agree with your argument about the
contention
and maintenance work. I think we can do what you are suggesting and
then fine tune it.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2007-08-30 16:18:38 Re: pg_dump --no-tablespaces patch
Previous Message Bruce Momjian 2007-08-30 15:56:41 Re: pg_dump --no-tablespaces patch