From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-10-20 17:04:03 |
Message-ID: | aPZrg6lxb5bgy_px@nathan |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote:
> On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
>> With this heuristic the regression is limited by %2 in the worst case.
>
> My worry is that the worst case is actually quite common. Sparse data sets
> dominated by a lot of null values (and hence lots of special characters) are
> very common. Are people prepared to accept a 2% regression on load times for
> such data sets?
Without knowing how common it is, I think it's difficult to judge whether
2% is a reasonable trade-off. If <5% of workloads might see a small
regression while the other >95% see double-digit percentage improvements,
then I might argue that it's fine. But I'm not sure we have any way to
know those sorts of details at the moment.
I'm also at least a little skeptical about the 2% number. IME that's
generally within the noise range and can vary greatly between machines and
test runs.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-10-20 17:07:04 | Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats() |
Previous Message | Nathan Bossart | 2025-10-20 16:46:27 | Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats() |