Re: WIP: Fast GiST index build

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-30 10:43:23
Message-ID: 4E5CBECB.2070002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.08.2011 13:38, Alexander Korotkov wrote:
> On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>>
>> Thanks. Meanwhile, I hacked together my own set of test scripts, and let
>> them run over the weekend. I'm still running tests with ordered data, but
>> here are some preliminary results:
>>
>> testname | nrows | duration | accesses
>> -----------------------------+**-----------+-----------------+**----------
>> points unordered auto | 250000000 | 08:08:39.174956 | 3757848
>> points unordered buffered | 250000000 | 09:29:16.47012 | 4049832
>> points unordered unbuffered | 250000000 | 03:48:10.999861 | 4564986
>>
>> As you can see, the results are very disappointing :-(. The buffered builds
>> take a lot *longer* than unbuffered ones. I was expecting the buffering to
>> be very helpful at least in these unordered tests. On the positive side, the
>> buffering made index quality somewhat better (accesses column, smaller is
>> better), but that's not what we're aiming at.
>>
>> What's going on here? This data set was large enough to not fit in RAM, the
>> table was about 8.5 GB in size (and I think the index is even larger than
>> that), and the box has 4GB of RAM. Does the buffering only help with even
>> larger indexes that exceed the cache size even more?
>>
> This seems pretty strange for me. Time of unbuffered index build shows that
> there is not bottleneck at IO. That radically differs from my
> experiments. I'm going to try your test script on my test setup.
> While I have only express assumption that random function appears to be
> somewhat bad. Thereby unordered dataset behave like the ordered one.

Oh. Doing a simple "SELECT * FROM points LIMIT 10", it looks pretty
random to me. The data should be uniformly distributed in a rectangle
from (0, 0) to (100000, 100000).

> Can you
> rerun tests on your test setup with dataset generation on the backend like
> this?
> CREATE TABLE points AS (SELECT point(random(), random() FROM
> generate_series(1,10000000));

Ok, I'll queue up that test after the ones I'm running now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message 权宗亮 2011-08-30 11:11:45 compile from git repository
Previous Message Heikki Linnakangas 2011-08-30 10:41:00 Re: WIP: Fast GiST index build