Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-03-10 11:42:28
Message-ID: CAN55FZ3Tn2DQUq40rASjrC14EQR=FzF7ynFRsqDf8tD=N_PX9w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, 10 Mar 2026 at 05:30, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> wrote:
>
> Here are some benchmarks showing what performance will look like for users who continue to use default_toast_compression = pglz.
>
> all compiled by meson with debugoptimized (-g -O2)
>
> arm NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT : 10055.141000 ms
> CSV : 10549.174500 ms
> TXT with 1/3 escapes: 10213.864750 ms
> CSV with 1/3 quotes: 12188.039000 ms
>
> arm NARROW master with inline with v11patch default_toast_compression = pglz
> TXT : 10070.153750 ms -0.149304% regression
> CSV : 10161.348750 ms 3.676361% improvement
> TXT with 1/3 escapes: 10618.005000 ms -3.956781% regression
> CSV with 1/3 quotes: 12279.366250 ms -0.749319% regression
>
> arm WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT : 11355.602750 ms
> CSV : 13893.110500 ms
> TXT with 1/3 escapes: 12872.690500 ms
> CSV with 1/3 quotes: 16722.262500 ms
>
> arm WIDE master with inline with v11patch default_toast_compression = pglz
> TXT : 9001.007250 ms 20.735099% improvement
> CSV : 8988.679750 ms 35.301171% improvement
> TXT with 1/3 escapes: 12191.137000 ms 5.294569% improvement
> CSV with 1/3 quotes: 16297.541500 ms 2.539854% improvement
>
>
> x86 NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT : 26243.084500 ms
> CSV : 27719.564000 ms
> TXT with 1/3 escapes: 29578.192750 ms
> CSV with 1/3 quotes: 34467.571250 ms
>
> x86 NARROW master with inline with v11patch default_toast_compression = pglz
> TXT : 26371.996750 ms -0.491224% regression
> CSV : 26137.186500 ms 5.708522% improvement
> TXT with 1/3 escapes: 28080.201000 ms 5.064514% improvement
> CSV with 1/3 quotes: 32557.377500 ms 5.542003% improvement
>
> x86 WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT : 28734.774750 ms
> CSV : 35700.485000 ms
> TXT with 1/3 escapes: 32376.878250 ms
> CSV with 1/3 quotes: 47024.985750 ms
>
> x86 WIDE master with inline with v11patch default_toast_compression = pglz
> TXT : 22753.755750 ms 20.814567% improvement
> CSV : 22977.195500 ms 35.638982% improvement
> TXT with 1/3 escapes: 29526.887000 ms 8.802551% improvement
> CSV with 1/3 quotes: 40298.196750 ms 14.304712% improvement

Thank you for the benchmark, results look nice! So, there is almost no
regression for both pglz and lz4 toast compression modes. Best case is
~60% improvement for the lz4 and ~35% improvement for the pglz.

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jet 2026-03-10 11:50:07 Re: Potential security risk associated with function call
Previous Message Kirill Reshke 2026-03-10 11:32:02 Re: Add missing stats_reset column to pg_stat_database_conflicts view