Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Shubham Barai <shubhambaraiss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
Date: 2018-03-10 07:40:28
Message-ID: CAA4eK1KtdCW4yd2uRapQDdY_h8+pq-Fcje_4Hr3anz6T4CfH7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2018 at 11:26 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> Yes, but I think it would be better if we call this once we are sure
>>> that at least one tuple from the old bucket has been transferred
>>> (consider if all tuples in the old bucket are dead). Apart from this,
>>> I think this patch has missed handling the cases where we scan the
>>> buckets when the split is in progress. In such cases, we scan both
>>> old and new bucket, so I think we need to ensure that we take
>>> PredicateLock on both the buckets during such scans.
>>
>> Hmm. Yeah.
>>
>> So, in _hash_first(), do you think we might just need this?
>>
>> if (H_BUCKET_BEING_POPULATED(opaque))
>> {
>> ...
>> old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
>> ...
>> old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
>> + PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
>> scan->xs_snapshot);
>> TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));
>>
>> That is, if you begin scanning a 'new' bucket, we remember the old
>> bucket and go and scan that too, so we'd better predicate-lock both up
>> front (or I suppose we could do it later when we visit that page, but
>> here it can be done in a single place).
>>
>
> Yeah, that can work, but I am slightly worried that we might actually
> never scan the old bucket (say for queries with Limit clause) in which
> case it might give false positives for insertions in old buckets.
>

I have changed the patch to address this point by acquiring predicate
lock in _hash_readnext where it will acquire the lock only when it
tries to scan the old bucket. I have also addressed another problem
related to transfer of predicate locks during split such that it will
transfer locks only when there is any tuple transferred from old to
the new bucket.

>
>> I'm wondering how to test all this. I'm thinking about a program that
>> repeatedly creates a hash index and then slowly adds more things to it
>> so that buckets split (maybe using distinct keys carefully crafted to
>> hit the same bucket?), while concurrently hammering it with a ton of
>> scans and then ... somehow checking correctness...
>>
>
> Yeah, that will generate the required errors, but not sure how to
> verify correctness. One idea could be that when the split is in
> progress, we somehow stop it in-between (say by cancel request) and
> then run targeted selects and inserts on the bucket being scanned and
> bucket being populated.
>

I have verified the bucket split scenario manually as below:

Setup
------------
create table hash_tbl(id int4, p integer);
insert into hash_tbl (id, p) select g, 10 from generate_series(1, 10) g;
Analyze hash_tbl;
create index hash_idx on hash_tbl using hash(p);

Session-1
----------------
begin isolation level serializable;
set enable_seqscan=off;
set enable_bitmapscan=off;
set enable_indexonlyscan=on;
select sum(p) from hash_tbl where p=10;
sum
-----
100
(1 row)

insert into hash_tbl (id, p) select g, 10 from generate_series(10, 1000) g;
-- Via debugger, stop in _hash_splitbucket at line 1283 {..
LockBuffer(bucket_obuf, BUFFER_LOCK_EXCLUSIVE); ...}

By this time split of bucket 1 is done but the split flag is not
cleared. So, predicate lock from bucket-1 have been transferred to
bucket-3 (new bucket).

Session-2
----------------
begin isolation level serializable;
set enable_seqscan=off;
set enable_bitmapscan=off;
set enable_indexonlyscan=on;
select sum(p) from hash_tbl where p=10;
sum
-----
100
(1 row)

Session-1
--------------
Commit;

Session-2
----------
postgres=# insert into hash_tbl (id, p) select g, 10 from
generate_series(51, 60) g;
ERROR: could not serialize access due to read/write dependencies
among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.

It got conflict while inserting in the new bucket (bucket number -3).

I think this patch now addresses all the open issues. Let me know what
do you think about it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
Predicate-Locking-in-hash-index_v6.patch application/octet-stream 28.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-03-10 07:53:56 Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
Previous Message Fabien COELHO 2018-03-10 06:49:45 Re: csv format for psql