Re: WAL usage calculation patch

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kirill Bychik <kirill(dot)bychik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: WAL usage calculation patch
Date: 2020-04-03 03:24:50
Message-ID: CAFiTN-sONFPUHxMpn9m4=Z+m_MubCdvcVD75+QRK5jxVmUce8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2020 at 9:28 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > >
> > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
> > > query | calls | wal_bytes | wal_records | wal_num_fpw
> > > ----------------------------------------------+-------+-----------+-------------+-------------
> > > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
> > > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
> > > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
> > > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
> > > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
> > > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
> > > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
> > > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
> > > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
> > > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
> > > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
> > > (11 rows)
> > >
> > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> > > relname | pg_relation_size
> > > -----------------------+------------------
> > > t1_idx_parallel_0 | 22487040
> > > t1_idx_parallel_0_bis | 22487040
> > > t1_idx_parallel_0_ter | 22487040
> > > t1_idx_parallel_2 | 22487040
> > > t1_idx_parallel_1 | 22487040
> > > t1_idx_parallel_4 | 22487040
> > > t1_idx_parallel_3 | 22487040
> > > t1_idx_parallel_5 | 22487040
> > > t1_idx_parallel_6 | 22487040
> > > t1_idx_parallel_7 | 22487040
> > > t1_idx_parallel_8 | 22487040
> > > (9 rows)
> > >
> > >
> > > So while the number of WAL records and full page images stay constant, we can
> > > see some small fluctuations in the total amount of generated WAL data, even for
> > > multiple execution of the sequential create index. I'm wondering if the
> > > fluctuations are due to some other internal details or if the WalUsage support
> > > is just completely broken (although I don't see any obvious issue ATM).
> > >
> >
> > I think we need to know the reason for this. Can you try with small
> > size indexes and see if the problem is reproducible? If it is, then it
> > will be easier to debug the same.
>
> I have done some testing to see where these extra WAL size is coming
> from. First I tried to create new db before every run then the size
> is consistent. But, then on the same server, I tired as Julien showed
> in his experiment then I am getting few extra wal bytes from next
> create index onwards. And, the waldump(attached in the mail) shows
> that is pg_class insert wal. I still have to check that why we need
> to write an extra wal size.
>
> create extension pg_stat_statements;
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> select * from pg_stat_statements_reset() ;
> create index t1_idx_parallel_0 ON t1(id);
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
> query
> | calls | wal_bytes | wal_records | wal_num_fpw
> ----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
> create index t1_idx_parallel_0 ON t1(id)
> | 1 | 49320 | 23 | 15
>
>
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> --select * from pg_stat_statements_reset() ;
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> create index t1_idx_parallel_1 ON t1(id);
>
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
> postgres[110383]=# select query, calls, wal_bytes, wal_records,
> wal_num_fpw from pg_stat_statements;
> query
> | calls | wal_bytes | wal_records | wal_num_fpw
> ----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
> create index t1_idx_parallel_1 ON t1(id)
> | 1 | 50040 | 23 | 15
>
> wal_bytes diff = 50040-49320 = 720
>
> Below, WAL record is causing the 720 bytes difference, all other WALs
> are of the same size.
> t1_idx_parallel_0:
> rmgr: Heap len (rec/tot): 54/ 7498, tx: 489, lsn:
> 0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> t1_idx_parallel_1:
> rmgr: Heap len (rec/tot): 54/ 8218, tx: 494, lsn:
> 0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> wal diff: 8218 - 7498 = 720

I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger. You can refer below code in
XLogRecordAssemble
{
....
bimg.length = BLCKSZ - cbimg.hole_length;

if (cbimg.hole_length == 0)
{
....
}
else
{
/* must skip the hole */
rdt_datas_last->data = page;
rdt_datas_last->len = bimg.hole_offset;

rdt_datas_last->next = &regbuf->bkp_rdatas[1];
rdt_datas_last = rdt_datas_last->next;

rdt_datas_last->data =
page + (bimg.hole_offset + cbimg.hole_length);
rdt_datas_last->len =
BLCKSZ - (bimg.hole_offset + cbimg.hole_length);
}

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-04-03 03:28:03 Re: Some problems of recovery conflict wait events
Previous Message Yuri Astrakhan 2020-04-03 03:17:40 Re: Yet another fast GiST build