Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
To: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2025-12-29 17:03:17
Message-ID: CAKWEB6qa4V+aU5-S_Eq=J2o09xp=3e-iLFVqimB0Zu6iq3GKdw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 24, 2025 at 9:08 AM KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:

> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that
> COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as
> the first special character is found, then fallback to scalar processing
> for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue
> scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special
> characters in the attribute, we don't lose anything here since it advances
> with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
> chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character,
> keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same
> heuristic to use for both COPY FROM/TO to reduce the regressions (same
> regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy
> as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO
> will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the
> second one.
>
>
> Regards,
> Ayoub
>

Hello, Nazir and Ayoub!

Nazir, sorry for the late reply, I am on holiday. :-) I wanted to thank you
for the tips on using cpupower to get less variance in my test results.

Ayoub, I suppose it was inevitable the SIMD patch would work for copying
out as well as copying in!

I am back at work on 5 Jan 2026, so I till try to carve out time to test
this then, using Nazir's tips.

Happy Holidays!

-Manni
--
-- Manni Wood EDB: https://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Konstantin Knizhnik 2025-12-29 16:34:47 Re: index prefetching