Re: Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-06-22 15:14:44
Message-ID: CA+Tgmoa7nOH2i_rWgNLwqqhpUVdb2WH=eGoZdsyQs4dFhHmL=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 22, 2016 at 5:10 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > Insertion will happen by scanning the appropriate bucket and needs to
>> > retain pin on primary bucket to ensure that concurrent split doesn't happen,
>> > otherwise split might leave this tuple unaccounted.
>>
>> What do you mean by "unaccounted"?
>
> It means that split might leave this tuple in old bucket even if it can be
> moved to new bucket. Consider a case where insertion has to add a tuple on
> some intermediate overflow bucket in the bucket chain, if we allow split
> when insertion is in progress, split might not move this newly inserted
> tuple.

OK, that's a good point.

>> > Now for deletion of tuples from (N+1/2) bucket, we need to wait for the
>> > completion of any scans that began before we finished populating bucket N+1,
>> > because otherwise we might remove tuples that they're still expecting to
>> > find in bucket (N+1)/2. The scan will always maintain a pin on primary
>> > bucket and Vacuum can take a buffer cleanup lock (cleanup lock includes
>> > Exclusive lock on bucket and wait till all the pins on buffer becomes zero)
>> > on primary bucket for the buffer. I think we can relax the requirement for
>> > vacuum to take cleanup lock (instead take Exclusive Lock on buckets where no
>> > split has happened) with the additional flag has_garbage which will be set
>> > on primary bucket, if any tuples have been moved from that bucket, however I
>> > think for squeeze phase (in this phase, we try to move the tuples from later
>> > overflow pages to earlier overflow pages in the bucket and then if there are
>> > any empty overflow pages, then we move them to kind of a free pool) of
>> > vacuum, we need a cleanup lock, otherwise scan results might get effected.
>>
>> affected, not effected.
>>
>> I think this is basically correct, although I don't find it to be as
>> clear as I think it could be. It seems very clear that any operation
>> which potentially changes the order of tuples in the bucket chain,
>> such as the squeeze phase as currently implemented, also needs to
>> exclude all concurrent scans. However, I think that it's OK for
>> vacuum to remove tuples from a given page with only an exclusive lock
>> on that particular page.
>
> How can we guarantee that it doesn't remove a tuple that is required by scan
> which is started after split-in-progress flag is set?

If the tuple is being removed by VACUUM, it is dead. We can remove
dead tuples right away, because no MVCC scan will see them. In fact,
the only snapshot that will see them is SnapshotAny, and there's no
problem with removing dead tuples while a SnapshotAny scan is in
progress. It's no different than heap_page_prune() removing tuples
that a SnapshotAny sequential scan was about to see.

If the tuple is being removed because the bucket was split, it's only
a problem if the scan predates setting the split-in-progress flag.
But since your design involves out-waiting all scans currently in
progress before setting that flag, there can't be any scan in progress
that hasn't seen it. A scan that has seen the flag won't look at the
tuple in any case.

>> (Plain text email is preferred to HTML on this mailing list.)
>>
>
> If I turn to Plain text [1], then the signature of my e-mail also changes to
> Plain text which don't want. Is there a way, I can retain signature
> settings in Rich Text and mail content as Plain Text.

Nope, but I don't see what you are worried about. There's no HTML
content in your signature anyway except for a link, and most
mail-reading software will turn that into a hyperlink even without the
HTML.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message alain radix 2016-06-22 15:15:47 Re: Requesting external_pid_file with postgres -C when not initialized lead to coredump
Previous Message Ashutosh Bapat 2016-06-22 10:37:57 Re: Postgres_fdw join pushdown - wrong results with whole-row reference