Re: Speed up COPY FROM text/CSV parsing using SIMD

From: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-01-17 20:46:52
Message-ID: CA+K2Ru=Ea_1CzQO1SxD30B=7fZSVb4qOymdigwdeSPnsCQzuXA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On Wed, Dec 31, 2025 at 2:04 PM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
wrote:

> Hi,
>
> On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> >
> > Hello,
> > Following the same path of optimizing COPY FROM using SIMD, i found that
> COPY TO can also benefit from this.
> >
> > I attached a small patch that uses SIMD to skip data and advance as far
> as the first special character is found, then fallback to scalar processing
> for that character and re-enter the SIMD path again...
> > There's two ways to do this:
> > 1) Essentially we do SIMD until we find a special character, then
> continue scalar path without re-entering SIMD again.
> > - This gives from 10% to 30% speedups depending on the weight of special
> characters in the attribute, we don't lose anything here since it advances
> with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
> chars).
> >
> > 2) Do SIMD path, then use scalar path when we hit a special character,
> keep re-entering the SIMD path each time.
> > - This is equivalent to the COPY FROM story, we'll need to find the same
> heuristic to use for both COPY FROM/TO to reduce the regressions (same
> regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
> >
> > Something else to note is that the scalar path for COPY TO isn't as
> heavy as the state machine in COPY FROM.
> >
> > So if we find the sweet spot for the heuristic, doing the same for COPY
> TO will be trivial and always beneficial.
> > Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is
> the second one.
>
> Patches look correct to me. I think we could move these SIMD code
> portions into a shared function to remove duplication, although that
> might have a performance impact. I have not benchmarked these patches
> yet.
>
Definitely yes.

>
> Another consideration is that these patches might need their own
> thread, though I am not completely sure about this yet.
>
I thought maybe since it uses the same infrastructure, it needs/does the
same ideas and it's an easier problem than COPY FROM so this might be
interesting to be kept/committed together.

Regards,
Ayoub Kazar

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2026-01-17 20:47:40 Re: Add rows removed by hash join clause to instrumentation
Previous Message Aditya Gollamudi 2026-01-17 20:22:54 Re: [PATCH] backup: Fix trivial typo and error message issues