| From: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
|---|---|
| To: | Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> |
| Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2025-11-12 14:44:02 |
| Message-ID: | CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Nov 11, 2025 at 11:23 PM Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
wrote:
> Hello!
>
> I wanted reproduce the results using files attached by Shinya Kato and
> Ayoub Kazar. I installed a postgres compiled from master, and then I
> installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches
> applied.
>
> The master+v3patches postgres naturally performed better on copying into
> the database: anywhere from 11% better for the t.csv file produced by
> Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file
> created by Ayoub Kazar's simd-copy-from-bench.sql.
>
> But here's where it gets weird. The two files created by Ayoub Kazar's
> simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt,
> and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5%
> respectively.
>
> This seems impossible.
>
> A few things I should note:
>
> I timed the commands using the Unix time command, like so:
>
> time psql -X -U mwood -h localhost -d postgres -c '\copy t from
> /tmp/t_4096_escape.txt'
>
> For each file, I timed the copy 6 times and took the average.
>
> This was done on my work Linux machine while also running Chrome and an
> Open Office spreadsheet; not a dedicated machine only running postgres.
>
Hello,
I think if you do a perf benchmark (if it still reproduces) it would
probably be possible to explain why it's performing like that looking at
the CPI and other metrics and compare it to my findings.
What i also suggest is to make the data close even closer to the worst case
i.e: more special characters where it hurts the switching between SIMD and
scalar processing (in simd-copy-from-bench.sql file), if still does a good
job then there's something to look at.
>
>
> All of the copy results took between 4.5 seconds (Shinyo's t.csv copied
> into postgres compiled from master) to 2 seconds (Ayoub
> Kazar's t_4096_none.csv copied into postgres compiled from master plus
> Nazir's v3 patches).
>
> Perhaps I need to fiddle with the provided SQL to produce larger files to
> get longer run times? Maybe sub-second differences won't tell as
> interesting a story as minutes-long copy commands?
>
I did try it on some GBs (around 2-5GB only), the differences were not that
much, but if you can run this on more GBs (at least 10GB) it would be good
to look at, although i don't suspect anything interesting since the shape
of data is the same for the totality of the COPY.
>
> Thanks for reading this.
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>
Thanks for the info.
Regards,
Ayoub Kazar.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Gustafsson | 2025-11-12 15:05:05 | Re: libpq OpenSSL and multithreading |
| Previous Message | Peter Eisentraut | 2025-11-12 14:35:58 | Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions |