Re: Better tracking of free space during SP-GiST index build

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>
Subject: Re: Better tracking of free space during SP-GiST index build
Date: 2016-09-25 18:33:07
Message-ID: CAF4Au4xij0eausM9gHqX0Km0=ZKv5eW2RFbCwttCMqBcEhj5UA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 09/22/2016 07:37 PM, Tom Lane wrote:
>>
>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>
>>> ... I've tried increasing the cache size to 768
>>> entries, with vast majority of them (~600) allocated to leaf pages.
>>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>>> without making the index significantly smaller (still ~120MB).
>>
>>
>> Yeah, that's in line with my results: not much further gain from a
>> larger cache. Though if you were testing with the same IRRExplorer
>> data, it's not surprising that our results would match. Would be
>> good to try some other cases...
>>
>
> Agreed, but I don't have any other data sets at hand. One possibility would
> be to generate something randomly (e.g. it's not particularly difficult to
> generate random IP addresses), but I'd much rather use some real-world data
> sets.

Tomas, I have one real dataset, which I used for testing spgist
(https://www.postgresql.org/message-id/CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com)
Let me know if you need it.

>
>>>
>>>
>>> One thing I'd change is making the SpGistLUPCache dynamic, i.e.
>>> storing the size and lastUsedPagesMap on the meta page. That
>>> should allow us resizing the cache and tweak lastUsedPagesMap in
>>> the future.
>>
>>
>> Yeah, probably a good idea. I had thought of bumping
>> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but
>> keeping it as a separate field won't add noticeable cost, and it
>> might save some trouble.
>>
>
> I see you plan to track only the cache size, while I proposed to track also
> the map, i.e. number of pages per category. I think that'd useful in case we
> come up with better values (e.g. more entries for leaf pages), or even
> somewhat adaptive way.
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-09-25 18:48:41 Re: PATCH: two slab-like memory allocators
Previous Message Vladimir Gordiychuk 2016-09-25 18:16:01 Re: Stopping logical replication protocol