Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-17 05:01:21
Message-ID: CAKWEB6p-9cwDhnN4GKOFz0Yzqb7PtTLChH-+wjd-SNhrbiJuLA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 16, 2026 at 12:15 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> On Mon, Feb 16, 2026 at 11:04:58AM -0600, Nathan Bossart wrote:
> > On Fri, Feb 13, 2026 at 09:34:13PM -0600, Manni Wood wrote:
> >> v7-0001 + v7-0002 applied to master certainly seems promising: nice to
> see
> >> speed improvements across the board on both x86 and arm!
> >
> > Thanks for testing. Based on these results, I think we can abandon 0002,
> > at least for now.
>
> Have you tested small rows, i.e., less than 16 bytes per row? I'm
> wondering if that regresses at all.
>
> --
> nathan
>

I ran some tests using narrow rows that look like this:

$ head t_none.txt
BB AA
BB AA
BB AA

$ head t_none.csv
BB,AA
BB,AA
BB,AA

$ head t_escape.txt
B\\B A\\A
B\\B A\\A
B\\B A\\A

$ head t_quote.csv
"B""B","A""A"
"B""B","A""A"
"B""B","A""A"

Here are the results on my x86 tower and my arm raspberry pi 5:

x86 NARROW master copy from
TXT : 2477.022500 ms
CSV : 2825.095500 ms
TXT with 1/3 escapes: 2620.575000 ms
CSV with 1/3 quotes: 3249.058750 ms

x86 NARROW v70001 copy from
TXT : 2475.659000 ms 0.055046% improvement
CSV : 2421.976750 ms 14.269208% improvement
TXT with 1/3 escapes: 2660.953750 ms -1.540836% regression
CSV with 1/3 quotes: 3255.546750 ms -0.199689% regression

x86 NARROW v70002 copy from
TXT : 2481.372250 ms -0.175604% regression
CSV : 2437.541250 ms 13.718271% improvement
TXT with 1/3 escapes: 2646.300000 ms -0.981655% regression
CSV with 1/3 quotes: 3202.014500 ms 1.447935% improvement

arm NARROW master copy from
TXT : 2294.270500 ms
CSV : 2085.839000 ms
TXT with 1/3 escapes: 2467.966000 ms
CSV with 1/3 quotes: 2485.533000 ms

arm NARROW v70001 copy from
TXT : 1982.497500 ms 13.589200% improvement
CSV : 2005.829500 ms 3.835843% improvement
TXT with 1/3 escapes: 2111.778250 ms 14.432442% improvement
CSV with 1/3 quotes: 2441.370000 ms 1.776802% improvement

arm NARROW v70002 copy from
TXT : 1975.982250 ms 13.873179% improvement
CSV : 2022.744000 ms 3.024922% improvement
TXT with 1/3 escapes: 2080.273000 ms 15.709009% improvement
CSV with 1/3 quotes: 2476.819000 ms 0.350589% improvement

Hope this helps!
--
-- Manni Wood EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-02-17 05:11:33 Re: Add into REFRESH PUBLICATION parameter exception_behaviour
Previous Message Amit Kapila 2026-02-17 04:45:07 Re: pgstat include expansion