Re: Parallel copy

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: David Fetter <david(at)fetter(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-22 00:28:02
Message-ID: 20200222002802.yew5buvrd2yrjkm6@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 21, 2020 at 02:54:31PM +0200, Ants Aasma wrote:
>On Thu, 20 Feb 2020 at 18:43, David Fetter <david(at)fetter(dot)org> wrote:>
>> On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
>> > I think the wc2 is showing that maybe instead of parallelizing the
>> > parsing, we might instead try using a different tokenizer/parser and
>> > make the implementation more efficient instead of just throwing more
>> > CPUs on it.
>>
>> That was what I had in mind.
>>
>> > I don't know if our code is similar to what wc does, maytbe parsing
>> > csv is more complicated than what wc does.
>>
>> CSV parsing differs from wc in that there are more states in the state
>> machine, but I don't see anything fundamentally different.
>
>The trouble with a state machine based approach is that the state
>transitions form a dependency chain, which means that at best the
>processing rate will be 4-5 cycles per byte (L1 latency to fetch the
>next state).
>
>I whipped together a quick prototype that uses SIMD and bitmap
>manipulations to do the equivalent of CopyReadLineText() in csv mode
>including quotes and escape handling, this runs at 0.25-0.5 cycles per
>byte.
>

Interesting. How does that compare to what we currently have?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Leonhard 2020-02-22 01:08:47 Make java client lib accept same connection strings as psql
Previous Message Tom Mercha 2020-02-22 00:20:06 Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements