Re: Speed up COPY TO text/CSV parsing using SIMD

From: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: Speed up COPY TO text/CSV parsing using SIMD
Date: 2026-04-02 18:07:38
Message-ID: CA+K2Ru=JK5NUEaxA77pCEer40QnV1TMxeg68Et9RL0zMZw_Jyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> > I added a prescan loop inside the simd helpers trying to catch special
> > chars in sizeof(Vector8) characters, i measured how good is this at
> > reducing the overhead of starting simd and exiting at first vector:
> > the scalar loop is better than SIMD for one vector if it finds a special
> > character before 6th character, worst case is not a clean vector, where
> the
> > scalar loop needs 20 more cycles compared to SIMD.
> > This helps mitigate the case of JSON(B) in CSV format, this is why I only
> > added this for CSV case only.
>
> Interesting.
>
> > In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> > 3% regression is gone.
>
> While these are nice results, I think it's best that we target v20 for this
> patch so that we have more time to benchmark and explore edge cases.
>
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure
we're not missing anything.

>
> --
> nathan

Regards,
Ayoub

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-04-02 18:08:18 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message Rafia Sabih 2026-04-02 18:00:25 Re: Bypassing cursors in postgres_fdw to enable parallel plans