From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-02-18 14:38:13 |
Message-ID: | CANwKhkNM1OBCMAoo5wkgGS6ZtFx+uMW-GZZC2FjzVBO5gFHJKQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > This is something similar to what I had also in mind for this idea. I
> > > had thought of handing over complete chunk (64K or whatever we
> > > decide). The one thing that slightly bothers me is that we will add
> > > some additional overhead of copying to and from shared memory which
> > > was earlier from local process memory. And, the tokenization (finding
> > > line boundaries) would be serial. I think that tokenization should be
> > > a small part of the overall work we do during the copy operation, but
> > > will do some measurements to ascertain the same.
> >
> > I don't think any extra copying is needed.
> >
>
> I am talking about access to shared memory instead of the process
> local memory. I understand that an extra copy won't be required.
>
> > The reader can directly
> > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> >
>
> I am slightly confused here. AFAIU, the for(;;) loop in
> CopyReadLineText is about finding the line endings which we thought
> that the reader process will do.
Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.
Regards,
Ants Aasma
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2020-02-18 15:11:49 | Re: Parallel copy |
Previous Message | Tom Lane | 2020-02-18 14:21:51 | Re: Index only scan and ctid |