Re: WIP: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-11 10:21:29
Message-ID: CAPpHfdvhoYKUUCVaC-yFbDaE5j2QTa94cVgYuKnzX4xwWUjSTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2011 at 11:45 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> unloadNodeBuffers() is now dead code.
>
processEmptyingStack calls it.

LEAF_PAGES_STATS_* are unused now.

Removed.

> Should avoid calling smgrnblocks() on every tuple, the overhead of that
> could add up.

Now calling at each BUFFERING_MODE_SWITCH_CHECK_STEP(256) tuples.

> I wonder, how hard would it be to merge gistBufferingBuildPlaceToPage(**)
> with the gistplacetopage() function used in the main codepath? There's very
> little difference between them, and it would be nice to maintain just one
> function. At the very least I think there should be a comment in both along
> the lines of "NOTE: if you change this function, make sure you update XXXX
> (the other function) as well!"
>
I doubt they can be effectively merged, but will try.

> In gistbuild(), in the final emptying stage, there's this special-case
> handling for the root block before looping through the buffers in the
> buffersOnLevels lists:
>
> for (;;)
>> {
>> nodeBuffer = getNodeBuffer(gfbb,
>> &buildstate.giststate, GIST_ROOT_BLKNO,
>>
>> InvalidOffsetNumber, NULL, false);
>> if (!nodeBuffer || nodeBuffer->blocksCount <= 0)
>> break;
>> MemoryContextSwitchTo(gfbb->**context);
>> gfbb->bufferEmptyingStack = lcons(nodeBuffer,
>> gfbb->bufferEmptyingStack);
>> MemoryContextSwitchTo(**buildstate.tmpCtx);
>> processEmptyingStack(&**buildstate.giststate,
>> &insertstate);
>> }
>>
>
> What's the purpose of that? Wouldn't the loop through buffersOnLevels lists
> take care of the root node too?
>
I was worried about node buffer deletion from list while scanning that
list gistbuild(). That's why I avoided deletion from lists.
Now I've added additional check for root node in loop over list.

> The calculations in initBuffering() desperately need comments. As does the
> rest of the code too, but the heuristics in that function are particularly
> hard to understand without some explanation.

Some comments were added. I'm working on more of them.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_fast_build-0.13.0.patch.gz application/x-gzip 22.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-08-11 10:28:58 Re: WIP: Fast GiST index build
Previous Message Andres Freund 2011-08-11 10:03:52 Re: "pgstat wait timeout" warnings