Skip site navigation (1) Skip section navigation (2)

Re: Nasty problem in hash indexes

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Nasty problem in hash indexes
Date: 2003-08-28 20:27:14
Message-ID: Pine.LNX.4.33.0308281425380.4942-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, 28 Aug 2003, Tom Lane wrote:

> I've traced through the failure reported here by Markus Kräutner:
> http://archives.postgresql.org/pgsql-hackers/2003-08/msg01132.php
> 
> What is happening is that as the UPDATE adds tuples (all with the same
> hash key value) to the table, the hash bucket being filled eventually
> requires more pages, and this results in a _hash_splitpage() operation
> (which is misnamed, it should be _hash_splitbucket).  By chance, the
> bucket that is selected to be split is the one containing the older key
> values, all of which get relocated to the new bucket.  So when control
> returns to the indexscan that is sourcing the tuples for the UPDATE,
> there are no tuples remaining in the bucket it is looking at, and it
> exits thinking it's done.
> 
> I'm not sure how many variants on this problem there might be, but
> clearly the fundamental bug is that a hash bucket split takes no account
> of preserving the state of concurrent index scans.
> 
> This is likely to be messy to fix :-(.  A brute-force solution may be
> possible by generalizing hash_adjscans so that it can update indexscans
> of our own backend for bucket-split operations; we'd have to rely on
> page locking to prevent problems against scans of other backends.  The
> locking aspect is particularly unattractive because of the possibility
> of deadlocks.  If a bucket split fails because of deadlock, we're
> probably left with a corrupt hash index.
> 
> Does anyone see a better way?
> 
> Does anyone want to vote to jettison the hash index code entirely?
> Personally I'm not eager to put a lot of work into fixing it.

I've had naught but bad experiences with hash indexes myself.  Maybe toss 
it and see if someone wants to reimplement it some day in the future?

If I'm reading this right, this bug means you could do:

select * from table where field in (1,2,3,4)

where you should get say 100 rows, and you might not get all 100 rows?  If 
so, then how many other bugs are lurking in the hash index code waiting to 
bite?


In response to

Responses

pgsql-hackers by date

Next:From: Sean ChittendenDate: 2003-08-28 20:45:45
Subject: [cguttesen@yahoo.dk: Re: Some additional tests run on my performance testing]
Previous:From: Greg StarkDate: 2003-08-28 20:16:07
Subject: Re: New array functions

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group