Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2025-10-18 20:01:41
Message-ID: CAN55FZ3e31ddFyf7XHW5G3ytuQwcXpetsb3wkx6q9oSp_zekhQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, 18 Oct 2025 at 21:46, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
>
> Hello,
>
> I’ve rebenchmarked the new heuristic patch, We still have the previous improvements ranging from 15% to 30%. For regressions i see at maximum 3% or 4% in the worst case, so this is solid.

Thank you so much for doing this! The results look nice, do you think
there are any other benchmarks that might be interesting to try?

> I'm also trying the idea of doing SIMD inside quotes with prefix XOR using carry less multiplication avoiding the slow path in all cases even with weird looking input, but it needs to take into consideration the availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we go, it quickly starts to become dirty OR we can wait for the decision to start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.

I can not quite picture this, would you mind sharing a few examples or patches?

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mankirat Singh 2025-10-18 20:48:58 Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Previous Message Nazir Bilal Yavuz 2025-10-18 20:01:29 Re: Speed up COPY FROM text/CSV parsing using SIMD