Re: Batching page logging during B-tree build

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Batching page logging during B-tree build
Date: 2020-10-09 18:08:42
Message-ID: 540584F2-A554-40C1-8F59-87AF8D623BB7@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 23 сент. 2020 г., в 23:19, Peter Geoghegan <pg(at)bowt(dot)ie> написал(а):
>
> On Fri, Sep 18, 2020 at 8:39 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> Here is PoC with porting that same routine to B-tree. It allows to build B-trees ~10% faster on my machine.
>
> It doesn't seem to make any difference on my machine, which has an
> NVME SSD (a Samsung 970 Pro). This is quite a fast SSD, though the
> sync time isn't exceptional. My test case is "reindex index
> pgbench_accounts_pkey", with pgbench scale 500. I thought that this
> would be a sympathetic case, since it's bottlenecked on writing the
> index, with relatively little time spent scanning and sorting in
> parallel workers.
> Can you provide a test case that is sympathetic towards the patch?
Thanks for looking into this!

I've tried this test on my machine (2019 macbook) on scale 10 for 20 seconds.
With patch I get consistently ~ tps = 2.403440, without patch ~ tps = 1.951975.
On scale 500 with patch
postgres=# reindex index pgbench_accounts_pkey;
REINDEX
Time: 21577,640 ms (00:21,578)
without patch
postgres=# reindex index pgbench_accounts_pkey;
REINDEX
Time: 26139,175 ms (00:26,139)

I think it's hardware dependent, I will try on servers.
>
> BTW, I noticed that the index build is absurdly bottlenecked on
> compressing WAL with wal_compression=on. It's almost 3x slower with
> compression turned on!

> 24 сент. 2020 г., в 00:33, Andres Freund <andres(at)anarazel(dot)de> написал(а):
>
>> I know that we've tested different compression methods in the past,
>> but perhaps index build performance was overlooked.
>
> I am pretty sure we have known that pglz for this was much much slower
> than alternatives. I seem to recall somebody posting convincing numbers,
> but can't find them just now.

There was a thread about different compressions[0]. It was demonstrated there that lz4 is 10 times faster on compression.
We have a patch to speedup pglz compression x1.43 [1], but I was hoping that we will go lz4\zstd way. It seems to me now, I actually should finish that speedup patch, it's very focused local refactoring.

Thanks!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/ea57b49a-ecf0-481a-a77b-631833354f7d%40postgrespro.ru#dcac101f8a73dfce98924066f6a12a13
[1] https://www.postgresql.org/message-id/flat/169163A8-C96F-4DBE-A062-7D1CECBE9E5D%40yandex-team.ru#996a194c12bacd2d093be2cb7ac54ca6

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2020-10-09 18:39:38 Re: dynamic result sets support in extended query protocol
Previous Message Andrew Dunstan 2020-10-09 17:32:48 Re: dynamic result sets support in extended query protocol