| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
| Subject: | Re: Speed up COPY TO text/CSV parsing using SIMD |
| Date: | 2026-03-17 18:49:24 |
| Message-ID: | abmiNPQOqBrRlf_m@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> pg_column_size
> ----------------
> 32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?
> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> + bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> + bool use_quote, bool use_simd, size_t len);
Can you test this on its own, too? We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
> if (is_csv)
> - CopyAttributeOutCSV(cstate, string,
> - cstate->opts.force_quote_flags[attnum - 1]);
> + {
> + if (use_simd)
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + true, len);
> + else
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + false, len);
> + }
> else
> - CopyAttributeOutText(cstate, string);
> + {
> + if (use_simd)
> + CopyAttributeOutText(cstate, string, true, len);
> + else
> + CopyAttributeOutText(cstate, string, false, len);
> + }
There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference. As above, it would
be good to measure it.
--
nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Haibo Yan | 2026-03-17 18:50:29 | Re: Return pg_control from pg_backup_stop(). |
| Previous Message | Zsolt Parragi | 2026-03-17 18:45:46 | Re: Fix uninitialized xl_running_xacts padding |