From: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
---|---|
To: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
Cc: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-08-14 14:59:55 |
Message-ID: | CA+K2Ru=jHuz_Wpgar4Sobtxeb33qxx=o59ToOhZ=vpmkMqErnA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Hi,
>
> On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> >
> > Following Nazir's findings about 4096 bytes being the performant line
> length, I did more benchmarks from my side on both TEXT and CSV formats
> with two different cases of normal data (no special characters) and data
> with many special characters.
> >
> > Results are con good as expected and similar to previous benchmarks
> > ~30.9% faster copy in TEXT format
> > ~32.4% faster copy in CSV format
> > 20%-30% reduces cycles per instructions
> >
> > In the case of doing a lot of special characters in the lines (e.g.,
> tables with large numbers of columns maybe), we obviously expect
> regressions here because of the overhead of many fallbacks to scalar
> processing.
> > Results for a 1/3 of line length of special characters:
> > ~43.9% slower copy in TEXT format
> > ~16.7% slower copy in CSV format
> > So for even less occurrences of special characters or wider distance
> between there might still be some regressions in this case, a
> non-significant case maybe, but can be treated in other patches if we
> consider to not use SIMD path sometimes.
> >
> > I hope this helps more and confirms the patch.
>
> Thanks for running that benchmark! Would you mind sharing a reproducer
> for the regression you observed?
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
Of course, I attached the sql to generate the text and csv test files.
If having a 1/3 of line length of special characters can be an
exaggeration, something lower might still reproduce some regressions of
course for the same idea.
Best regards,
Ayoub Kazar
Attachment | Content-Type | Size |
---|---|---|
simd-copy-from-bench.sql | application/sql | 812 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-08-14 15:14:49 | Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words |
Previous Message | David Rowley | 2025-08-14 14:49:18 | Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words |