Re: [PATCH] Hex-coding optimizations using SVE on ARM.

From: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date: 2025-09-04 14:55:50
Message-ID: OS9PR01MB15185B278E343A9BA5F0F6AB19700A@OS9PR01MB15185.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I see that there was some discussion about a Neon implementation upthread,
> but I'm not sure we concluded anything. For popcount, we first added a
> Neon version before adding the SVE version, which required more complicated
> configure/runtime checks. Presumably Neon is available on more hardware
> than SVE, so that could be a good place to start here, too.

We have added the Neon versions of hex encode/decode.
Here are the microbenchmark numbers.

hex_encode - m7g.4xlarge
Input | Head | Neon
-------+--------+--------
32 | 18.056 | 5.957
40 | 22.127 | 10.205
48 | 26.214 | 14.151
64 | 33.613 | 6.164
128 | 66.060 | 11.372
256 |130.225 | 18.543
512 |267.105 | 33.977
1024 |515.603 | 64.462

hex_decode - m7g.4xlarge
Input | Head | Neon
-------+--------+--------
32 | 26.669 | 9.462
40 | 36.320 | 19.347
48 | 45.971 | 19.099
64 | 58.468 | 17.648
128 |113.250 | 30.437
256 |218.743 | 56.824
512 |414.133 |107.212
1024 |828.493 |210.740

> Also, I'd strongly encourage you to get involved with others' patches on
> the mailing lists (e.g., reviewing, testing). Patch submissions are great,
> but this community depends on other types of participation, too. IME
> helping others with their patches also tends to incentivize others to help
> with yours.

Sure, we will try to test/review patches on areas we have experience.

> On that note, I was hoping you could give us feedback on whether the
> improvement in PG18 made any difference at all in your real-world
> use-case, i.e. not just in a microbenchmark, but also including
> transmission of the hex-encoded values across the network to the
> client (that I assume must decode them again).

Yes, the improvement in v18 did help, check the attached perf graphs.
We used a python script to send and receive binary data from postgres.
For simple select queries on a bytea column, hex_encode was taking
42% of the query execution time in v17, this was reduced to 33% in v18,
resulting in around 18% improvement in overall query time.

The proposed patch further reduces the hex_encode function usage to
5.6%, another 25% improvement in total query time.

We observed similar improvements for insert queries on the bytea column.
hex_decode usage decreased from 15.5% to 5.5%, a 5-8% query level
improvement depending on which storage type is used.

------
Chiranmoy

Attachment Content-Type Size
v6-0001-NEON-support-for-hex-coding.patch application/octet-stream 10.2 KB
v6-0002-SVE-support-for-hex-coding.patch application/octet-stream 21.1 KB
v6-0003-Regression-tests-for-SIMD-hex-coding.patch application/octet-stream 7.4 KB
bytea_read_hex_encode_sve.svg image/svg+xml 292.8 KB
bytea_read_hex_encode_v17.svg image/svg+xml 287.7 KB
bytea_read_hex_encode_v18.svg image/svg+xml 255.1 KB
bytea_write_hex_decode_sve.svg image/svg+xml 325.4 KB
bytea_write_hex_decode_v18.svg image/svg+xml 280.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-09-04 14:56:01 Re: How can end users know the cause of LR slot sync delays?
Previous Message Mihail Nikalayeu 2025-09-04 14:16:20 Re: Unexpected changes of CurrentResourceOwner and CurrentMemoryContext