Re: UUID v1 optimizations...

From: Ancoron Luciferis <ancoron(dot)luciferis(at)googlemail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: UUID v1 optimizations...
Date: 2019-07-09 22:08:34
Message-ID: 7c77ae7a-3d89-15f7-7130-e60a61dc4406@googlemail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 08/07/2019 02:26, Peter Geoghegan wrote:
> Please don't top post -- trim the your response down so that only
> still-relevant text remains.
>
> On Tue, Jun 11, 2019 at 1:27 PM Ancoron Luciferis
> <ancoron(dot)luciferis(at)googlemail(dot)com> wrote:
>> Primary key indexes after an ANALYZE:
>> table_name | bloat | index_mb | table_mb
>> -------------------+----------------+----------+----------
>> uuid_v1 | 767 MiB (49 %) | 1571.039 | 1689.195
>> uuid_v1_timestamp | 768 MiB (49 %) | 1571.039 | 1689.195
>> uuid_seq | 759 MiB (49 %) | 1562.766 | 1689.195
>> uuid_serial | 700 MiB (47 %) | 1504.047 | 1689.195
>>
>> OK, sadly no reclaim in any of them.
>
> I don't know how you got these figures, but most likely they don't
> take into account the fact that the FSM for the index has free blocks
> available. You'll only notice that if you have additional page splits
> that can recycle that space. Or, you could use pg_freespacemap to get
> some idea.

Hm, I think I've already read quite a bit about the internals of the PG
b-tree index implementation but still cannot get to the answer how I
could influence that on my end as I want to stay compatible with the
standard UUID data storage but need time sorting support.

Anyway, I've made a bit of progress in testing and now have the full
tests executing unattended with the help of a script:
https://github.com/ancoron/pg-uuid-test

I've uploaded one of the test run results here:
https://gist.github.com/ancoron/d5114b0907e8974b6808077e02f8d109

After the first mass deletion, I can now see quite some savings for
both, serial and for my new time-sorted index:
table_name | bloat | index_mb | table_mb
-------------+-----------------+----------+----------
uuid_v1 | 1500 MiB (48 %) | 3106.406 | 3378.383
uuid_serial | 800 MiB (33 %) | 2406.453 | 3378.383
uuid_v1_ext | 800 MiB (33 %) | 2406.453 | 3378.383

...but in a second case (DELETE old + INSERT new), the savings are gone
again in both cases:
table_name | bloat | index_mb | table_mb
-------------+-----------------+----------+----------
uuid_v1 | 1547 MiB (49 %) | 3153.859 | 3378.383
uuid_serial | 1402 MiB (47 %) | 3008.055 | 3378.383
uuid_v1_ext | 1403 MiB (47 %) | 3008.055 | 3378.383

So, the question for me would be: Is there any kind of data that plays
optimal with space-savings in a rolling (e.g. last X rows) scenario?

>
>> 5.) REINDEX
>> Table: uuid_v1 Time: 21549.860 ms (00:21.550)
>> Table: uuid_v1_timestamp Time: 27367.817 ms (00:27.368)
>> Table: uuid_seq Time: 19142.711 ms (00:19.143)
>> Table: uuid_serial Time: 16889.807 ms (00:16.890)
>>
>> Even in this case it looks as if my implementation is faster than
>> anything else - which I really don't get.
>
> Sorting already-sorted data is faster. CREATE INDEX is mostly a big
> sort operation in the case of B-Tree indexes.

Understood, this seems to be confirmed by my time-sorted index in the
new tests:
uuid_v1: 27632.660 ms (00:27.633)
uuid_serial: 20519.363 ms (00:20.519) x1.35
uuid_v1_ext: 23846.474 ms (00:23.846) x1.16

>
>> I might implement a different opclass for the standard UUID to enable
>> time-wise index sort order. This will naturally be very close to
>> physical order but I doubt that this is something I can tell PostgreSQL, or?
>
> PostgreSQL only knows whether or not your page splits occur in the
> rightmost page in the index -- it fills the page differently according
> to whether or not that is the case.
>

As I've implemented the new opclass and the new tests showing the
results now, I think I can say that the time-sorting behavior as opposed
to rather random really benefits the overall performance, which is what
I actually care about most.

Cheers,

Ancoron

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ancoron Luciferis 2019-07-09 22:16:38 Re: Custom opclass for column statistics?
Previous Message Nicolas Charles 2019-07-09 15:21:34 Re: Optimizing `WHERE x IN` query