Re: Speed up COPY FROM text/CSV parsing using SIMD

From: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2025-12-24 15:07:55
Message-ID: CA+K2RumOaH-daBGN6uTo6+_0XSg7HQ10Na8OzScCV5j6eKkFgA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
Following the same path of optimizing COPY FROM using SIMD, i found that
COPY TO can also benefit from this.

I attached a small patch that uses SIMD to skip data and advance as far as
the first special character is found, then fallback to scalar processing
for that character and re-enter the SIMD path again...
There's two ways to do this:
1) Essentially we do SIMD until we find a special character, then continue
scalar path without re-entering SIMD again.
- This gives from 10% to 30% speedups depending on the weight of special
characters in the attribute, we don't lose anything here since it advances
with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
chars).

2) Do SIMD path, then use scalar path when we hit a special character, keep
re-entering the SIMD path each time.
- This is equivalent to the COPY FROM story, we'll need to find the same
heuristic to use for both COPY FROM/TO to reduce the regressions (same
regressions: around from 20% to 30% with 1/3, 2/3 specials chars).

Something else to note is that the scalar path for COPY TO isn't as heavy
as the state machine in COPY FROM.

So if we find the sweet spot for the heuristic, doing the same for COPY TO
will be trivial and always beneficial.
Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the
second one.

Regards,
Ayoub

Attachment Content-Type Size
0005-Speed-up-COPY-TO-text-CSV-using-SIMD.patch text/x-patch 9.2 KB
0004-Speed-up-COPY-TO-text-CSV-using-SIMD.patch text/x-patch 4.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Srirama Kucherlapati 2025-12-24 15:34:13 RE: AIX support
Previous Message Fujii Masao 2025-12-24 14:48:25 Re: Two issues with version checks in CREATE SUBSCRIPTION