Re: WIP: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-30 10:38:10
Message-ID: CAPpHfdu=BUTTqk-t04DrqTyWH1MHH2JPJZwsNLjbDAk8SH5EyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>
> Thanks. Meanwhile, I hacked together my own set of test scripts, and let
> them run over the weekend. I'm still running tests with ordered data, but
> here are some preliminary results:
>
> testname | nrows | duration | accesses
> -----------------------------+**-----------+-----------------+**----------
> points unordered auto | 250000000 | 08:08:39.174956 | 3757848
> points unordered buffered | 250000000 | 09:29:16.47012 | 4049832
> points unordered unbuffered | 250000000 | 03:48:10.999861 | 4564986
>
> As you can see, the results are very disappointing :-(. The buffered builds
> take a lot *longer* than unbuffered ones. I was expecting the buffering to
> be very helpful at least in these unordered tests. On the positive side, the
> buffering made index quality somewhat better (accesses column, smaller is
> better), but that's not what we're aiming at.
>
> What's going on here? This data set was large enough to not fit in RAM, the
> table was about 8.5 GB in size (and I think the index is even larger than
> that), and the box has 4GB of RAM. Does the buffering only help with even
> larger indexes that exceed the cache size even more?
>
This seems pretty strange for me. Time of unbuffered index build shows that
there is not bottleneck at IO. That radically differs from my
experiments. I'm going to try your test script on my test setup.
While I have only express assumption that random function appears to be
somewhat bad. Thereby unordered dataset behave like the ordered one. Can you
rerun tests on your test setup with dataset generation on the backend like
this?
CREATE TABLE points AS (SELECT point(random(), random() FROM
generate_series(1,10000000));

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2011-08-30 10:38:44 Re: Single pass vacuum - take 2
Previous Message Alexander Korotkov 2011-08-30 10:29:02 Re: WIP: Fast GiST index build