| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
| Cc: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2026-02-06 22:47:18 |
| Message-ID: | aYZvdsXPElQvwWOA@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Feb 07, 2026 at 01:19:16AM +0300, Nazir Bilal Yavuz wrote:
> I have three possible approaches in my mind, they are actually similar
> to each other.
>
> 1- After encountering a special character, disable SIMD for the rest
> of the current line and also for the rest of the data.
>
> 2- It is a mixed version of the current heuristic and #1. After
> encountering a special character, skip SIMD for the current line (let'
> say line 1) and for the next line (line 2). Then try running SIMD for
> the next line (line 3), if there is no special character continue to
> run SIMD but if there is a special character then skip running SIMD
> for two lines this time. And it goes like that, everytime special
> character is encountered in the SIMD run, skipped SIMD lines are
> doubled.
>
> 3- This version is a bit different from #2. Instead of calculating the
> number of lines to skip dynamically, skip the constant N number of
> lines and then try to run SIMD again after these lines. N could be
> something like 100, 1000, or 10000 etc.. Actually, you and Andrew
> suggested this approach before [1].
>
> I think what you suggested is closer to #1 or #3. I just wanted to
> hear your opinions, and whether you think any of these approaches are
> good to implement / work on.
Yeah, I think either (1) or (3) would be a good starting point. (1) is
basically just (3) with N set to infinity, anyway. I imagine there's some
value less than infinity that is acceptable, but if I had to pick an
approach right now, I'd probably go with (1) to essentially remove the
heuristic from the discussion until we're ready to focus on it.
--
nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2026-02-06 23:09:22 | Re: [PING] fallocate() causes btrfs to never compress postgresql files |
| Previous Message | Sami Imseih | 2026-02-06 22:42:52 | Does pg_stat_statements need to modify core queryId? |