| From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2026-03-12 10:59:53 |
| Message-ID: | CAN55FZ1sn-2tVX_n9C5UNBCfDPjSDOCT4zkMeDsB7MaZ9SUBTw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, 11 Mar 2026 at 23:42, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> > Here is v14 which is v13-0001 + v13-0002.
>
> Thanks! It's getting close.
>
> > + /*
> > + * Temporary variables are used here instead of passing the actual
> > + * variables (especially input_buf_ptr) directly to the helper. Taking
> > + * the address of a local variable might force the compiler to
> > + * allocate it on the stack rather than in a register. Because
> > + * input_buf_ptr is used heavily in the hot scalar path below, keeping
> > + * it in a register is important for performance.
> > + */
> > + int temp_input_buf_ptr;
> > + bool temp_hit_eof = hit_eof;
>
> A few notes:
>
> * Does using a temporary variable for hit_eof actually make a difference?
> AFAICT that's only updated when loading more data.
>
> * Does inlining the function produce the same results?
>
> * Also, I'm curious what the usual benchmarks look like with and without
> this hack for the latest patch.
I tried to benchmark all of these questions, here are the results:
Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155).
v14 means d841ca2d14 + v14.
v14 + #1 means removing temporary variables.
v14 + #2 means removing temp_hit_eof variable only.
v14 + #3 means inlining CopyReadLineTextSIMDHelper().
v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing
temporary variables (#1).
------------------------------------------------------------
Results for default_toast_compression = 'lz4':
+-------------------------------------------+
| Optimization: -O2 |
+------------+--------------+---------------+
| | Text | CSV |
+------------+------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+------------+------+-------+-------+-------+
| Old master | 4260 | 4789 | 5930 | 8276 |
+------------+------+-------+-------+-------+
| v14 | 2489 | 4439 | 2529 | 8098 |
+------------+------+-------+-------+-------+
| v14 + #1 | 2472 | 5177 | 2479 | 9285 |
+------------+------+-------+-------+-------+
| v14 + #2 | 2521 | 4252 | 2481 | 8050 |
+------------+------+-------+-------+-------+
| v14 + #3 | 2632 | 4569 | 2458 | 8657 |
+------------+------+-------+-------+-------+
| v14 + #4 | 2476 | 4239 | 2475 | 10544 |
+------------+------+-------+-------+-------+
| | | | | |
+------------+------+-------+-------+-------+
| | | | | |
+------------+------+-------+-------+-------+
| | Text | CSV |
+------------+------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
| v14 | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
| v14 + #1 | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
| v14 + #2 | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
| v14 + #3 | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
| v14 + #4 | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+
------------------------------------------------------------
Results for default_toast_compression = 'pglz':
+-------------------------------------------+
| Optimization: -O2 |
+------------+--------------+---------------+
| | Text | CSV |
+------------+------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+------------+------+-------+-------+-------+
| Old master | 4260 | 4789 | 5930 | 8276 |
+------------+------+-------+-------+-------+
| v14 | 2489 | 4439 | 2529 | 8098 |
+------------+------+-------+-------+-------+
| v14 + #1 | 2472 | 5177 | 2479 | 9285 |
+------------+------+-------+-------+-------+
| v14 + #2 | 2521 | 4252 | 2481 | 8050 |
+------------+------+-------+-------+-------+
| v14 + #3 | 2632 | 4569 | 2458 | 8657 |
+------------+------+-------+-------+-------+
| v14 + #4 | 2476 | 4239 | 2475 | 10544 |
+------------+------+-------+-------+-------+
| | | | | |
+------------+------+-------+-------+-------+
| | | | | |
+------------+------+-------+-------+-------+
| | Text | CSV |
+------------+------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
| v14 | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
| v14 + #1 | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
| v14 + #2 | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
| v14 + #3 | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
| v14 + #4 | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+
------------------------------------------------------------
By looking these results:
v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases.
v14 + #4 performs worse on CSV & wide & 1/3 cases.
v14 and v14 + #2 perform very similarly. They don't have regression. I
think we can move forward with one of these.
--
Regards,
Nazir Bilal Yavuz
Microsoft
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2026-03-12 11:02:43 | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Previous Message | Peter Eisentraut | 2026-03-12 10:54:11 | Re: meson vs. llvm bitcode files |