| From: | Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> |
|---|---|
| To: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
| Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2025-11-26 14:21:46 |
| Message-ID: | CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Nov 26, 2025 at 5:51 AM KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> Hello,
> On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
> wrote:
>
>> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
>> > Thanks, done.
>>
>> I took a look at the v3 patches. Here are my high-level thoughts:
>>
>> + /*
>> + * Parse data and transfer into line_buf. To get benefit from
>> inlining,
>> + * call CopyReadLineText() with the constant boolean variables.
>> + */
>> + if (cstate->simd_continue)
>> + result = CopyReadLineText(cstate, is_csv, true);
>> + else
>> + result = CopyReadLineText(cstate, is_csv, false);
>>
>> I'm curious whether this actually generates different code, and if it
>> does,
>> if it's actually faster. We're already branching on cstate->simd_continue
>> here.
>
> I've compiled both versions with -O2 and confirmed they generate different
> code. When simd_continue is passed as a constant to CopyReadLineText, the
> compiler optimizes out the condition checks from the SIMD path.
> A small benchmark on a 1GB+ file shows the expected benefit which is
> around 6% performance improvement.
> I've attached the assembly outputs in case someone wants to check
> something else.
>
>
> Regards,
> Ayoub Kazar
>
Correction to my last post:
I also tried files that alternated lines with no special characters and
lines with 1/3rd special characters, thinking I could force the algorithm
to continually check whether or not it should use simd and therefore force
more overhead in the try-simd/don't-try-simd housekeeping code. The text
file was still 20% faster (not 50% faster as I originally stated --- that
was a typo). The CSV file was still 13% faster.
Also, apologies for posting at the top in my last e-mail.
--
-- Manni Wood EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2025-11-26 14:24:52 | Re: macOS - Sequoia CI task is stuck on the Postgres Github CI |
| Previous Message | Filip Janus | 2025-11-26 14:02:58 | Re: [PATCH] Better Performance for PostgreSQL with large INSERTs |