Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-03-13 13:34:49
Message-ID: CAN55FZ0ocS6cBHEWqHv2s-dK91U6OdVLBqj7VexTehtBtioDbA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
>
> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
> tmp_hit_eof variable then the regression disappears. Also, if I use a
> struct like below, regression disappears again.

> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
> regression. I really don't understand why this is happening on my end.
> Manni didn't encounter any regression on the benchmark [1].

Problem might be related to gcc. I am using Debian Trixie and my
current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
is no regression, which makes more sense IMO.

Here is a comparison for csv & wide & 1/3 case. Postgres is compiled
with buildtype=debugoptimized and default_toast_compression is lz4.

+--------------------------------+
| CSV & WIDE & 1/3, LZ4, -O2 |
+--------------+--------+--------+
| | gcc | clang |
| | 14.0.2 | 19.1.7 |
+--------------+--------+--------+
| old master | 8250 | 10400 |
+--------------+--------+--------+
| v14 | 8100 | 9800 |
+--------------+--------+--------+
| v15 | 9200 | 9800 |
+--------------+--------+--------+
| v15 + struct | 7750 | 9800 |
+--------------+--------+--------+

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Devrim Gündüz 2026-03-13 13:36:04 Re: LLVM 22
Previous Message Peter Eisentraut 2026-03-13 13:27:23 Re: tid_blockno() and tid_offset() accessor functions