| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
| Cc: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2025-10-29 22:22:46 |
| Message-ID: | 5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2025-10-22 We 3:24 PM, Nathan Bossart wrote:
> On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
>> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>>> I wonder if we could mitigate the regression further by spacing out the
>>> checks a bit more. It could be worth comparing a variety of values to
>>> identify what works best with the test data.
>> Do you mean that instead of doubling the SIMD sleep, we should
>> multiply it by 3 (or another factor)? Or are you referring to
>> increasing the maximum sleep from 1024? Or possibly both?
> I'm not sure of the precise details, but the main thrust of my suggestion
> is to assume that whatever sampling you do to determine whether to use SIMD
> is good for a larger chunk of data. That is, if you are sampling 1K lines
> and then using the result to choose whether to use SIMD for the next 100K
> lines, we could instead bump the latter number to 1M lines (or something).
> That way we minimize the regression for relatively uniform data sets while
> retaining some ability to adapt in case things change halfway through a
> large table.
>
I'd be ok with numbers like this, although I suspect the numbers of
cases where we see shape shifts like this in the middle of a data set
would be vanishingly small.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2025-10-29 22:51:42 | Re: Use BumpContext contexts for TupleHashTables' tablecxt |
| Previous Message | Joe Conway | 2025-10-29 22:11:26 | Re: contrib/sepgsql regression tests have been broken for months |