From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-10-22 12:33:37 |
Message-ID: | CAN55FZ0AYP4ZEczBJ5ur-=9QuEhMysH9Yfrq5srr0ZakK1M0FA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote:
> > I think the problem is deciding how many lines to process before
> > deciding for the rest. 1000 lines could work for the small sized data
> > but it might not work for the big sized data. Also, it might cause a
> > worse regressions for the small sized data.
>
> IMHO we have some leeway with smaller amounts of data. If COPY FROM for
> 1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems
> unlikely users would be inconvenienced all that much. (Those numbers are
> completely made up in order to illustrate my point.)
>
> > Because of this reason, I
> > tried to implement a heuristic that will work regardless of the size
> > of the data. The last heuristic I suggested will run SIMD for
> > approximately (#number_of_lines / 1024 [1024 is the max number of
> > lines to sleep before running SIMD again]) lines if all characters in
> > the data are special characters.
>
> I wonder if we could mitigate the regression further by spacing out the
> checks a bit more. It could be worth comparing a variety of values to
> identify what works best with the test data.
Do you mean that instead of doubling the SIMD sleep, we should
multiply it by 3 (or another factor)? Or are you referring to
increasing the maximum sleep from 1024? Or possibly both?
--
Regards,
Nazir Bilal Yavuz
Microsoft
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2025-10-22 12:44:15 | Re: RFC: adding pytest as a supported test framework |
Previous Message | BharatDB | 2025-10-22 12:26:41 | Re: Fix ALTER TABLE DROP EXPRESSION with inheritance hierarchy |