Re: Rework of collation code, extensibility

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: Rework of collation code, extensibility
Date: 2023-01-26 23:47:13
Message-ID: 64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Attached v9 and added perf numbers below.

I'm hoping to commit 0002 and 0003 soon-ish, maybe a week or two,
please let me know if you want me to hold off. (I won't commit the GUCs
unless others find them generally useful; they are included here to
more easily reproduce my performance tests.)

My primary motivation is still related to
https://commitfest.postgresql.org/41/3956/ but the combination of
cleaner code and a performance boost seems like reasonable
justification for this patch set independently.

There aren't any clear open items on this patch. Peter Eisentraut asked
me to focus this thread on the refactoring, which I've done by reducing
it to 2 patches, and I left multilib ICU up to the other thread. He
also questioned the increased line count, but I think the currently-low
line count is due to bad style. PeterG provided some review comments,
in particular when to do the tiebreaking, which I addressed.

This patch has been around for a while, so I'll take a fresh look and
see if I see risk areas, and re-run a few sanity checks. Of course more
feedback would also be welcome.

PERFORMANCE:

======
Setup:
======

base: master with v9-0001 applied (GUCs only)
refactor: master with v9-0001, v9-0002, v9-0003 applied

Note that I wasn't able to see any performance difference between the
base and master, v9-0001 just adds some GUCs to make testing easier.

glibc 2.35 ICU 70.1
gcc 11.3.0 LLVM 14.0.0

built with meson (uses -O3)

$ perl text_generator.pl 10000000 10 > /tmp/strings.utf8.txt

CREATE TABLE s (t TEXT);
COPY s FROM '/tmp/strings.utf8.txt';
VACUUM FREEZE s;
CHECKPOINT;
SET work_mem='10GB';
SET max_parallel_workers = 0;
SET max_parallel_workers_per_gather = 0;

=============
Test queries:
=============

EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "C";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en_US";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu";

Timings are measured as the milliseconds to return the first tuple from
the Sort operator (as reported in EXPLAIN ANALYZE). Median of three
runs.

========
Results:
========

base refactor speedup

sort_abbreviated_keys=false:
C 7377 7273 1.4%
en_US 35081 35090 0.0%
en-US-x-ixu 20520 19465 5.4%

sort_abbreviated_keys=true:
C 8105 8008 1.2%
en_US 35067 34850 0.6%
en-US-x-icu 22626 21507 5.2%

===========
Conclusion:
===========

These numbers can move +/-1 percentage point, so I'd interpret anything
less than that as noise. This happens to be the first run where all the
numbers favored the refactoring patch, but it is generally consistent
with what I had seen before.

The important part is that, for ICU, it appears to be a substantial
speedup when using meson (-O3).

Also, when/if the multilib ICU support goes in, that may lose some of
these gains due to an extra indirect function call.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
text_generator.pl application/x-perl 515 bytes
v9-0001-Introduce-GUCs-to-control-abbreviated-keys-sort-o.patch text/x-patch 8.4 KB
v9-0002-Add-pg_strcoll-pg_strxfrm-and-variants.patch text/x-patch 42.1 KB
v9-0003-Refactor-pg_locale_t-routines.patch text/x-patch 43.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-01-27 00:07:29 Partition key causes problem for volatile target list query
Previous Message Peter Geoghegan 2023-01-26 23:36:52 Re: New strategies for freezing, advancing relfrozenxid early