Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-19 12:35:53
Message-ID: CALDaNm2dYgE0g9n3rGyw_v=-0zucUdkR7c_9rr9=Dj=SfPx9PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 15, 2020 at 2:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Oct 14, 2020 at 6:51 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > I am not able to properly parse the data but If understand the wal
> > > data for non-parallel (1116 | 0 | 3587203) and parallel (1119
> > > | 6 | 3624405) case doesn't seem to be the same. Is that
> > > right? If so, why? Please ensure that no checkpoint happens for both
> > > cases.
> > >
> >
> > I have disabled checkpoint, the results with the checkpoint disabled
> > are given below:
> > | wal_records | wal_fpi | wal_bytes
> > Sequential Copy | 1116 | 0 | 3587669
> > Parallel Copy(1 worker) | 1116 | 0 | 3587669
> > Parallel Copy(4 worker) | 1121 | 0 | 3587668
> > I noticed that for 1 worker wal_records & wal_bytes are same as
> > sequential copy, but with different worker count I had noticed that
> > there is difference in wal_records & wal_bytes, I think the difference
> > should be ok because with more than 1 worker the order of records
> > processed will be different based on which worker picks which records
> > to process from input file. In the case of sequential copy/1 worker
> > the order in which the records will be processed is always in the same
> > order hence wal_bytes are the same.
> >
>
> Are all records of the same size in your test? If so, then why the
> order should matter? Also, even the number of wal_records has
> increased but wal_bytes are not increased, rather it is one-byte less.
> Can we identify what is going on here? I don't intend to say that it
> is a problem but we should know the reason clearly.

The earlier run that I executed was with varying record size. The
below results are by modifying the records to keep it of same size:
| wal_records | wal_fpi
| wal_bytes
Sequential Copy | 1307 | 0 | 4198526
Parallel Copy(1 worker) | 1307 | 0 | 4198526
Parallel Copy(2 worker) | 1308 | 0 | 4198836
Parallel Copy(4 worker) | 1307 | 0 | 4199147
Parallel Copy(8 worker) | 1312 | 0 | 4199735
Parallel Copy(16 worker) | 1313 | 0 | 4200311

Still I noticed that there is some difference in wal_records &
wal_bytes. I feel the difference in wal_records & wal_bytes is because
of the following reasons:
Each worker prepares 1000 tuples and then tries to do
heap_multi_insert for 1000 tuples, In our case approximately 185
tuples is stored in 1 page, 925 tuples are stored in 5 WAL records and
the remaining 75 tuples are stored in next WAL record. The wal dump is
like below:
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/0160EC80, prev 0/0160DDB0, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 0
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/0160FB28, prev 0/0160EC80, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 1
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016109E8, prev 0/0160FB28, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 2
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01611890, prev 0/016109E8, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 3
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01612750, prev 0/01611890, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 4
rmgr: Heap2 len (rec/tot): 1550/ 1550, tx: 510, lsn:
0/016135F8, prev 0/01612750, desc: MULTI_INSERT+INIT 75 tuples flags
0x02, blkref #0: rel 1663/13751/16384 blk 5

After the 1st 1000 tuples are inserted and when the worker tries to
insert another 1000 tuples, it will use the last page which had free
space to insert where we can insert 110 more tuples:
rmgr: Heap2 len (rec/tot): 2470/ 2470, tx: 510, lsn:
0/01613C08, prev 0/016135F8, desc: MULTI_INSERT 110 tuples flags 0x00,
blkref #0: rel 1663/13751/16384 blk 5
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016145C8, prev 0/01613C08, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 6
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01615470, prev 0/016145C8, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 7
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01616330, prev 0/01615470, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 8
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016171D8, prev 0/01616330, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 9
rmgr: Heap2 len (rec/tot): 3050/ 3050, tx: 510, lsn:
0/01618098, prev 0/016171D8, desc: MULTI_INSERT+INIT 150 tuples flags
0x02, blkref #0: rel 1663/13751/16384 blk 10

This behavior will be the same for sequential copy and copy with 1
worker as the sequence of insert & the pages used to insert is in same
order. There 2 reasons together result in the varying wal_size &
wal_records with multiple worker: 1) When more than 1 worker is
involved the sequence in which the pages that will be selected is not
guaranteed, the MULTI_INSERT tuple count varies &
MULTI_INSERT/MULTI_INSERT+INIT description varies. 2) wal_records will
increase with more number of workers because when the tuples are split
across the workers, one of the worker will have few more WAL record
because the last heap_multi_insert gets split across the workers and
generates new wal records like:
rmgr: Heap2 len (rec/tot): 600/ 600, tx: 510, lsn:
0/019F8B08, prev 0/019F7C48, desc: MULTI_INSERT 25 tuples flags 0x00,
blkref #0: rel 1663/13751/16384 blk 1065

Attached the tar of wal file dump which was used for analysis.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
wal_dump.tar application/x-tar 2.8 MB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-10-19 14:10:49 Is Recovery actually paused?
Previous Message Ian Lawrence Barwick 2020-10-19 12:28:39 Re: [doc] improve tableoid description