Re: Parallel copy

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-18 14:38:13
Message-ID: CANwKhkNM1OBCMAoo5wkgGS6ZtFx+uMW-GZZC2FjzVBO5gFHJKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > This is something similar to what I had also in mind for this idea. I
> > > had thought of handing over complete chunk (64K or whatever we
> > > decide). The one thing that slightly bothers me is that we will add
> > > some additional overhead of copying to and from shared memory which
> > > was earlier from local process memory. And, the tokenization (finding
> > > line boundaries) would be serial. I think that tokenization should be
> > > a small part of the overall work we do during the copy operation, but
> > > will do some measurements to ascertain the same.
> >
> > I don't think any extra copying is needed.
> >
>
> I am talking about access to shared memory instead of the process
> local memory. I understand that an extra copy won't be required.
>
> > The reader can directly
> > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> >
>
> I am slightly confused here. AFAIU, the for(;;) loop in
> CopyReadLineText is about finding the line endings which we thought
> that the reader process will do.

Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.

Regards,
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2020-02-18 15:11:49 Re: Parallel copy
Previous Message Tom Lane 2020-02-18 14:21:51 Re: Index only scan and ctid