Re: benchmark results comparing versions 15.2 and 16

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: MARK CALLAGHAN <mdcallag(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: benchmark results comparing versions 15.2 and 16
Date: 2023-05-29 16:55:18
Message-ID: CAH2-WznuVEn1BNqGZerKtertgAM4y4HRaxjceDNk71Kbjh1McQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 28, 2023 at 2:42 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> c6e0fe1f2 might have helped improve some of that performance, but I
> suspect there must be something else as ~3x seems much more than I'd
> expect from reducing the memory overheads. Testing versions before
> and after that commit might give a better indication.

I'm virtually certain that this is due to the change in default
collation provider, from libc to ICU. Mostly due to the fact that ICU
is capable of using abbreviated keys, and the system libc isn't
(unless you go out of your way to define TRUST_STRXFRM when building
Postgres).

Many individual test cases involving larger non-C collation text sorts
showed similar improvements back when I worked on this. Offhand, I
believe that 3x - 3.5x improvements in execution times were common
with high entropy abbreviated keys on high cardinality input columns
at that time (this was with glibc). Low cardinality inputs were more
like 2.5x.

I believe that ICU is faster than glibc in general -- even with
TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
most important factor here, by far.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-05-29 17:01:20 Re: abi-compliance-checker
Previous Message Masahiko Sawada 2023-05-29 14:08:41 Re: PG 16 draft release notes ready