Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Shubham Barai <shubhambaraiss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
Date: 2018-03-02 03:57:02
Message-ID: CAEepm=1wa_wwGKAjGXego1KVooumF=m_2DZo=0W-+0JW_uCaGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 2, 2018 at 3:57 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Based on this sub-thread this patch's status of 'needs review' doesn't
> quite seem accurate and 'waiting on author' and then 'returned with
> feedback' would be more fitting?

I personally think this patch is really close to RFC. Shubham has
fulfilled the project requirement, it's a tidy and short patch, it has
tests. I think we really just need to verify that the split case
works correctly.

Hmm. I notice that this calls PredicateLockPageSplit() after both
calls to _hash_splitbucket() (the one in _hash_finish_split() and the
one in _hash_expandtable()) instead of doing it inside that function,
and that _hash_splitbucket() unlocks bucket_nbuf before returning.
What if someone else accesses bucket_nbuf between
LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
PredicateLockPageSplit()? Doesn't that mean that another session can
read a newly created page and miss a predicate lock that is about to
be transferred to it? If that is indeed a race, could it be fixed by
calling PredicateLockPageSplit() at the start of _hash_splitbucket()
instead?

Could we get a few days to mull over this and Shubham's other patches?
It'd be really great to get some of these into 11.

My thought experiments about pseudo-pages and avoiding the split stuff
were not intended to get the patch kicked out. I thought for a while
that hash indexes were a special case and could benefit from
dispensing with those trickier problems. Upon further reflection, for
interesting size hash indexes pure hash value predicate tags wouldn't
be much better. Furthermore, if we do decide we want to use using x %
max_predicate_locks_per_relation to avoid having to escalate to
relation predicate locks at the cost of slightly higher collision rate
then we should consider that for the whole system (including heap page
predicate locking), not just hash indexes. Please consider those
ideas parked for now.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2018-03-02 03:59:25 Re: zheap: a new storage format for PostgreSQL
Previous Message Andres Freund 2018-03-02 03:55:38 Re: WIP: BRIN multi-range indexes