Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2025-12-10 11:59:41
Message-ID: CAN55FZ1p5UyUdTRO7iWR_ukjhJDOnpOR2rYNOq=+hcC45OuahQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 10 Dec 2025 at 01:13, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> wrote:
>
> Bilal Yavuz (Nazir Bilal Yavuz?),

It is Nazir Bilal Yavuz, I changed some settings on my phone and it
seems that it affected my mail account, hopefully it should be fixed
now.

> I did not get a chance to do any work on this today, but wanted to thank you for finding my logic errors in counting special chars for CSV, and hacking on my naive solution to make it faster. By attempting Andrew Dunstan's suggestion, I got a better feel for the reality that the "housekeeping" code produces a significant amount of overhead.

You are welcome! v4.1 has some problems with in_quote case in SIMD
handling code and counting cstate->chars_processed variable. I fixed
them in v4.2.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment Content-Type Size
v4.2-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch text/x-patch 3.7 KB
v4.2-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch text/x-patch 5.0 KB
v4.2-0003-Feedback-Changes.patch text/x-patch 8.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-12-10 12:23:50 Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message David Geier 2025-12-10 11:48:35 Re: Consistently use palloc_object() and palloc_array()