Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Shubham Barai <shubhambaraiss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
Date: 2018-03-05 03:28:38
Message-ID: CAEepm=3S+MtY+K20_yjj1h1n3vk2m7d0EYNmxZttb_01y18o8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Hmm. I notice that this calls PredicateLockPageSplit() after both
>> calls to _hash_splitbucket() (the one in _hash_finish_split() and the
>> one in _hash_expandtable()) instead of doing it inside that function,
>> and that _hash_splitbucket() unlocks bucket_nbuf before returning.
>> What if someone else accesses bucket_nbuf between
>> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
>> PredicateLockPageSplit()? Doesn't that mean that another session can
>> read a newly created page and miss a predicate lock that is about to
>> be transferred to it?
>
> Yes. I think you are primarily worried about if there is an insert on
> new bucket from another session as scans will anyway take the
> predicate lock, right?

Yeah.

>> If that is indeed a race, could it be fixed by
>> calling PredicateLockPageSplit() at the start of _hash_splitbucket()
>> instead?
>
> Yes, but I think it would be better if we call this once we are sure
> that at least one tuple from the old bucket has been transferred
> (consider if all tuples in the old bucket are dead). Apart from this,
> I think this patch has missed handling the cases where we scan the
> buckets when the split is in progress. In such cases, we scan both
> old and new bucket, so I think we need to ensure that we take
> PredicateLock on both the buckets during such scans.

Hmm. Yeah.

So, in _hash_first(), do you think we might just need this?

if (H_BUCKET_BEING_POPULATED(opaque))
{
...
old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
...
old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
+ PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
scan->xs_snapshot);
TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));

That is, if you begin scanning a 'new' bucket, we remember the old
bucket and go and scan that too, so we'd better predicate-lock both up
front (or I suppose we could do it later when we visit that page, but
here it can be done in a single place).

What if we begin scanning an 'old' bucket that is being split? I
think we'll only do that for tuples that actually belong in the old
bucket after the split, so no need to double-lock? And I don't think
a split could begin while we are scanning. Do I have that right?

As for inserting, I'm not sure if any special treatment is needed, as
long as the scan code path (above) and the split code path are
correct. I'm not sure though.

I'm wondering how to test all this. I'm thinking about a program that
repeatedly creates a hash index and then slowly adds more things to it
so that buckets split (maybe using distinct keys carefully crafted to
hit the same bucket?), while concurrently hammering it with a ton of
scans and then ... somehow checking correctness...

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-05 03:30:28 Typo in src/backend/access/hash/README
Previous Message Etsuro Fujita 2018-03-05 03:21:29 Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly