Quick Links

Re: Speed up COPY FROM text/CSV parsing using SIMD

From:	Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
To:	Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc:	Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date:	2025-08-19 09:09:20
Message-ID:	CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> I have a couple of ideas that I was working on:
> ---
>
> + * However, SIMD optimization cannot be applied in the following cases:
> + * - Inside quoted fields, where escape sequences and closing quotes
> + * require sequential processing to handle correctly.
>
> I think you can continue SIMD inside quoted fields. Only important
> thing is you need to set last_was_esc to false when SIMD skipped the
> chunk.

There is a trick with doing carryless multiplication with -1 that can
be used to SIMD process transitions between quoted/not-quoted. [1]
This is able to convert a bitmask of unescaped quote character
positions to a quote mask in a single operation. I last looked at it 5
years ago, but I remember coming to the conclusion that it would work
for implementing PostgreSQL's interpretation of CSV.

[1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76

--
Ants

In response to

Re: Speed up COPY FROM text/CSV parsing using SIMD at 2025-08-07 11:15:06 from Nazir Bilal Yavuz

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kirill Reshke	2025-08-19 09:14:00	Re: VM corruption on standby
Previous Message	Chao Li	2025-08-19 08:54:45	Re: Identifying function-lookup failures due to argument name mismatches