Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Shubham Barai <shubhambaraiss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
Date: 2018-02-28 04:05:42
Message-ID: CA+TgmoYMC9KYg6cgDpamr3t8DhYk=BvTyW8J8S6=mVik+-GPvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2018 at 7:51 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Thinking about how to tune that got me thinking about a simple middle
> way we could perhaps consider...
>
> What if we just always locked pseudo page numbers using hash_value %
> max_predicate_locks_per_relation (= effectively 31 by default)? Then
> you'd have lower collision rates on small hash indexes, you'd never
> have to deal with page splits, and you'd never exceed
> max_predicate_locks_per_relation so you'd never escalate to relation
> level locks on busy systems. On the downside, you'd have eg ~3%
> chance of collision instead of a 1/hash_maxbucket chance of collision,
> so it gets a bit worse for large indexes on systems that are not busy
> enough to exceed max_predicate_locks_per_relation. You win some, you
> lose some...

Hmm, yeah, that definitely has some appeal. On the other hand,
there's a lot of daylight between locking hv % 2^32 and locking hv %
31; the former is going to potentially blow out the lock table really
fast, while the latter is potentially going to create an uncomfortable
number of false collisions. One could imagine a modulus larger than
31 and smaller than 4294967296, although I don't have a principled
suggestion for how to pick it. On the third hand, people who are
using SSI heavily may well have increased
max_predicate_locks_per_relation and with this proposal that just
kinda does what you want.

I don't really know how we can judge the merits of any particular
modulus (or of committing the patch at all) without some test results
showing that it helps reduce rollbacks or increase performance or
something. Otherwise we're just guessing. It does however seem to me
that locking the hash value % (something) is better than basing the
locking on bucket or page numbers. Basing it on page numbers strictly
speaking cannot work, since the same tuple could be present in any
page in the bucket chain; you'd have to lock the page number of the
head of the bucket chain. There is however no advantage of doing that
over locking the bucket number directly. Moreover, locking the bucket
number directly forces you to worry about splits, whereas if you log
hv % (something) then you don't have to care.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-28 04:18:06 Re: Let's remove DSM_INPL_NONE.
Previous Message Robert Haas 2018-02-28 03:55:14 Re: jsonpath