From: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date: | 2025-09-04 14:55:50 |
Message-ID: | OS9PR01MB15185B278E343A9BA5F0F6AB19700A@OS9PR01MB15185.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I see that there was some discussion about a Neon implementation upthread,
> but I'm not sure we concluded anything. For popcount, we first added a
> Neon version before adding the SVE version, which required more complicated
> configure/runtime checks. Presumably Neon is available on more hardware
> than SVE, so that could be a good place to start here, too.
We have added the Neon versions of hex encode/decode.
Here are the microbenchmark numbers.
hex_encode - m7g.4xlarge
Input | Head | Neon
-------+--------+--------
32 | 18.056 | 5.957
40 | 22.127 | 10.205
48 | 26.214 | 14.151
64 | 33.613 | 6.164
128 | 66.060 | 11.372
256 |130.225 | 18.543
512 |267.105 | 33.977
1024 |515.603 | 64.462
hex_decode - m7g.4xlarge
Input | Head | Neon
-------+--------+--------
32 | 26.669 | 9.462
40 | 36.320 | 19.347
48 | 45.971 | 19.099
64 | 58.468 | 17.648
128 |113.250 | 30.437
256 |218.743 | 56.824
512 |414.133 |107.212
1024 |828.493 |210.740
> Also, I'd strongly encourage you to get involved with others' patches on
> the mailing lists (e.g., reviewing, testing). Patch submissions are great,
> but this community depends on other types of participation, too. IME
> helping others with their patches also tends to incentivize others to help
> with yours.
Sure, we will try to test/review patches on areas we have experience.
> On that note, I was hoping you could give us feedback on whether the
> improvement in PG18 made any difference at all in your real-world
> use-case, i.e. not just in a microbenchmark, but also including
> transmission of the hex-encoded values across the network to the
> client (that I assume must decode them again).
Yes, the improvement in v18 did help, check the attached perf graphs.
We used a python script to send and receive binary data from postgres.
For simple select queries on a bytea column, hex_encode was taking
42% of the query execution time in v17, this was reduced to 33% in v18,
resulting in around 18% improvement in overall query time.
The proposed patch further reduces the hex_encode function usage to
5.6%, another 25% improvement in total query time.
We observed similar improvements for insert queries on the bytea column.
hex_decode usage decreased from 15.5% to 5.5%, a 5-8% query level
improvement depending on which storage type is used.
------
Chiranmoy
Attachment | Content-Type | Size |
---|---|---|
v6-0001-NEON-support-for-hex-coding.patch | application/octet-stream | 10.2 KB |
v6-0002-SVE-support-for-hex-coding.patch | application/octet-stream | 21.1 KB |
v6-0003-Regression-tests-for-SIMD-hex-coding.patch | application/octet-stream | 7.4 KB |
bytea_read_hex_encode_sve.svg | image/svg+xml | 292.8 KB |
bytea_read_hex_encode_v17.svg | image/svg+xml | 287.7 KB |
bytea_read_hex_encode_v18.svg | image/svg+xml | 255.1 KB |
bytea_write_hex_decode_sve.svg | image/svg+xml | 325.4 KB |
bytea_write_hex_decode_v18.svg | image/svg+xml | 280.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Shlok Kyal | 2025-09-04 14:56:01 | Re: How can end users know the cause of LR slot sync delays? |
Previous Message | Mihail Nikalayeu | 2025-09-04 14:16:20 | Re: Unexpected changes of CurrentResourceOwner and CurrentMemoryContext |