Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-19 04:22:11
Message-ID: CAA4eK1KAUnH2dUztj_ugS4LqihSM0hQpMiPRDcUyCZohqAzGOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants(at)cybertec(dot)at> wrote:
>
> On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> > >
> > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > This is something similar to what I had also in mind for this idea. I
> > > > had thought of handing over complete chunk (64K or whatever we
> > > > decide). The one thing that slightly bothers me is that we will add
> > > > some additional overhead of copying to and from shared memory which
> > > > was earlier from local process memory. And, the tokenization (finding
> > > > line boundaries) would be serial. I think that tokenization should be
> > > > a small part of the overall work we do during the copy operation, but
> > > > will do some measurements to ascertain the same.
> > >
> > > I don't think any extra copying is needed.
> > >
> >
> > I am talking about access to shared memory instead of the process
> > local memory. I understand that an extra copy won't be required.
> >
> > > The reader can directly
> > > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> > >
> >
> > I am slightly confused here. AFAIU, the for(;;) loop in
> > CopyReadLineText is about finding the line endings which we thought
> > that the reader process will do.
>
> Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
> currently copies data from cstate->raw_buf to the StringInfo in
> cstate->line_buf. In parallel mode it would copy it from the shared data buffer
> to local line_buf until it hits the line end found by the data reader. The
> amount of copying done is still exactly the same as it is now.
>

Yeah, on a broader level it will be something like that, but actual
details might vary during implementation. BTW, have you given any
thoughts on one other approach I have shared above [1]? We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.

[1] - https://www.postgresql.org/message-id/CAA4eK1LyAyPCtBk4rkwomeT6%3DyTse5qWws-7i9EFwnUFZhvu5w%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-02-19 04:23:48 Re: Parallel copy
Previous Message Michael Paquier 2020-02-19 04:22:00 Re: Clean up some old cruft related to Windows