Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-03 10:23:24
Message-ID: CALDaNm13zK=JXfZWqZJsm3+2yagYDJc=eJBgE4i77-4PPNj7vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

We have made a patch on the lines that were discussed in the previous
mails. We could achieve up to 9.87X performance improvement. The
improvement varies from case to case.

Workers/
Exec time (seconds) copy from file,
2 indexes on integer columns
1 index on text column copy from stdin,
2 indexes on integer columns
1 index on text column copy from file, 1 gist index on text column copy
from file,
3 indexes on integer columns copy from stdin, 3 indexes on integer columns
0 1162.772(1X) 1176.035(1X) 827.669(1X) 216.171(1X) 217.376(1X)
1 1110.288(1.05X) 1120.556(1.05X) 747.384(1.11X) 174.242(1.24X)
163.492(1.33X)
2 635.249(1.83X) 668.18(1.76X) 435.673(1.9X) 133.829(1.61X) 126.516(1.72X)
4 336.835(3.45X) 346.768(3.39X) 236.406(3.5X) 105.767(2.04X) 107.382(2.02X)
8 188.577(6.17X) 194.491(6.04X) 148.962(5.56X) 100.708(2.15X) 107.72(2.01X)
16 126.819(9.17X) 146.402(8.03X) 119.923(6.9X) 97.996(2.2X) 106.531(2.04X)
20 *117.845(9.87X)* 149.203(7.88X) 138.741(5.96X) 97.94(2.21X) 107.5(2.02)
30 127.554(9.11X) 161.218(7.29X) 172.443(4.8X) 98.232(2.2X) 108.778(1.99X)

Posting the initial patch to get the feedback.

Design of the Parallel Copy: The backend, to which the "COPY FROM" query is
submitted acts as leader with the responsibility of reading data from the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as many blocks
as required into the DSM data blocks from the file. Each block is of 64K
size. The leader parses the data to identify a chunk, the existing logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the required chunk is
freed up by the worker and then copies the identified chunks information
(offset & chunk size) into the DSM chunks. This process is repeated till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all the chunks
populated are processed by the workers and exits.

We would like to include support of parallel copy for referential integrity
constraints and parallelizing copy from binary format files in the future.
The above mentioned tests were run with CSV format, file size of 5.1GB & 10
million records in the table. The postgres configuration and system
configuration used is attached in config.txt.
Myself and one of my colleagues Bharath have developed this patch. We would
like to thank Amit, Dilip, Robert, Andres, Ants, Kuntal, Alastair, Tomas,
David, Thomas, Andrew & Kyotaro for their thoughts/discussions/suggestions.

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

On Mon, May 18, 2020 at 10:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > > My sense is that it would be a lot more sensible to do it at the
> > > > *beginning* of the parallel operation. Once we do it once, we
> > > > shouldn't ever do it again; that's how it works now. Deferring it
> > > > until later seems much more likely to break things.
> > >
> > > AFAIU, we always increment the command counter after executing the
> > > command. Why do we want to do it differently here?
> >
> > Hmm, now I'm starting to think that I'm confused about what is under
> > discussion here. Which CommandCounterIncrement() are we talking about
> > here?
> >
>
> The one we do after executing a non-readonly command. Let me try to
> explain by example:
>
> CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY,
> height real, weight real);
> insert into tab_fk_referenced_chk values( 1, 1.1, 100);
> CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES
> tab_fk_referenced_chk(refindex), height real, weight real);
>
> COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH(
> DELIMITER ',');
> 1,1.1,100
> 1,2.1,200
> 1,3.1,300
> \.
>
> In the above case, even though we are executing a single command from
> the user perspective, but the currentCommandId will be four after the
> command. One increment will be for the copy command and the other
> three increments are for locking tuple in PK table
> (tab_fk_referenced_chk) for three tuples in FK table
> (tab_fk_referencing_chk). Now, for parallel workers, it is
> (theoretically) possible that the three tuples are processed by three
> different workers which don't get synced as of now. The question was
> do we see any kind of problem with this and if so can we just sync it
> up at the end of parallelism.
>
> > > First, let me clarify the CTID I have used in my email are for the
> > > table in which insertion is happening which means FK table. So, in
> > > such a case, we can't have the same CTIDs queued for different
> > > workers. Basically, we use CTID to fetch the row from FK table later
> > > and form a query to lock (in KEY SHARE mode) the corresponding tuple
> > > in PK table. Now, it is possible that two different workers try to
> > > lock the same row of PK table. I am not clear what problem group
> > > locking can have in this case because these are non-conflicting locks.
> > > Can you please elaborate a bit more?
> >
> > I'm concerned about two workers trying to take the same lock at the
> > same time. If that's prevented by the buffer locking then I think it's
> > OK, but if it's prevented by a heavyweight lock then it's not going to
> > work in this case.
> >
>
> We do take buffer lock in exclusive mode before trying to acquire KEY
> SHARE lock on the tuple, so the two workers shouldn't try to acquire
> at the same time. I think you are trying to see if in any case, two
> workers try to acquire heavyweight lock like tuple lock or something
> like that to perform the operation then it will create a problem
> because due to group locking it will allow such an operation where it
> should not have been. But I don't think anything of that sort is
> feasible in COPY operation and if it is then we probably need to
> carefully block it or find some solution for it.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Attachment Content-Type Size
config.txt text/plain 1.2 KB
0001-Support-parallel-copy.patch application/x-patch 107.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-06-03 10:34:46 Re: Internal key management system
Previous Message Fujii Masao 2020-06-03 10:20:47 Re: Should we remove a fallback promotion? take 2