Re: Bug in new buffering GiST build code

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in new buffering GiST build code
Date: 2012-05-25 20:33:12
Message-ID: 4FBFEC88.8020501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.05.2012 01:09, Alexander Korotkov wrote:
> Hi!
>
> On Tue, May 22, 2012 at 12:56 AM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> The management of the path stacks is a bit complicated, anyway. I'll think
>> about this some more tomorrow, maybe we can make it simpler, knowing that
>> we have to do those extra lookups.
>
> WOW! You did enormous work on exploring that!
> I just arrived from PGCon and start looking at it when find you've already
> done comprehensive research of this problem.
> On the step 5 if we've NSN in GISTBufferingInsertStack structure, we could
> detect situation of changing parent of splitted page. Using this we could
> save copy of GISTBufferingInsertStack for B2 with original parent A,
> because we know split of B to occur after creating GISTBufferingInsertStack
> but before split of A. The question is how to find this copy from C, hash?

I tested a patch that adds the extra getNodeBuffer() call after
refinding the parent, as discussed. However, I'm still getting a "failed
to-refind parent" error later in the build, so I think we're still
missing some corner case.

I think we should rewrite the way we track the parents completely.
Rather than keep a path stack attached to every node buffer, let's just
maintain a second hash table that contains the parent of every internal
node. Whenever a downlink is moved to another page, update the hash
table with the new information. That way we always have up-to-date
information about the parent of every internal node.

That's much easier to understand than the path stack structures we have
now. I think the overall memory consumption will be about the same too.
Although we need the extra hash table with one entry for every internal
node, we get rid of the path stack structs, which are also one per every
internal node at the moment.

I believe it is faster too. I added some instrumentation to the existing
gist code (with the additional getNodeBuffer() call added to fix this
bug), to measure the time spent moving right, when refinding the parent
of a page. I added gettimeofday() calls before and after moving right,
and summed the total. In my test case, the final index size was about
19GB, and the index build took 3545 seconds (59 minutes). Of that time,
580 seconds (~ 10 minutes) was spent moving right to refind parents.
That's a lot. I also printed a line whenever a refind operation had to
move right 20 pages or more. That happened 2482 times during the build,
in the worst case we moved right over 40000 pages.

Attached is a patch to replace the path stacks with a hash table. With
this patch, the index build time in my test case dropped from 59 minutes
to about 55 minutes. I don'ẗ know how representative or repeatable this
test case is, but this definitely seems very worthwhile, not only
because it fixes the bug and makes the code simpler, but also on
performance grounds.

Alexander, do you still have the test environments and data lying around
that you used for GiST buffering testing last summer? Could you rerun
some of those tests with this patch?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
use-hash-table-for-parents-1.patch text/x-diff 30.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-25 20:47:41 Re: patch: Use pg_mbcliplen for truncation in text-to-name conversion
Previous Message Jeff Janes 2012-05-25 20:06:05 Foreground vacuum and buffer access strategy