Re: User Interface for WAL usage data

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kirill Bychik <kirill(dot)bychik(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: User Interface for WAL usage data
Date: 2020-04-02 06:29:31
Message-ID: CAOBaU_YdCWxG3fC5cvL-MeYykRkE_U+BdsnwStniBfqHGq5CuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks all for the feedback.

On Thu, Apr 2, 2020 at 8:02 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Apr 2, 2020 at 11:28 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Thu, 2 Apr 2020 00:41:20 -0500, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote in
> > > On Thu, Apr 02, 2020 at 10:13:18AM +0530, Amit Kapila wrote:
> > > > In thread [1], we are discussing to expose WAL usage data for each
> > > > statement in a way quite similar to how we expose BufferUsage data.
> > > > The way it exposes seems reasonable to me and no one else raises any
> > > > objection. It could be that it appears fine to others who have
> > > > reviewed the patch but I thought it would be a good idea to write a
> > > > separate email just for its UI and see if anybody has objection.
> > >
> > > +1
> > >
> > > Regarding v10-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patch:
> > > I think there should be additional spaces before "full" and before "bytes":
> > >
> > > > WAL: records=2359 full page records=42 bytes=447788
> > >
> > > Compare with these:
> > >
> > > "Sort Method: %s %s: %ldkB\n",
> > > "Buckets: %d (originally %d) Batches: %d (originally %d) Memory Usage: %ldkB\n",
> > > "Buckets: %d Batches: %d Memory Usage: %ldkB\n",
> > >
> > > Otherwise "records=2359 full page records=42" is hard to parse.
> >
> > I got the same feeling seeing the line.
> >
>
> But isn't this same as we have BufferUsage data? We can probably
> display it as full_page_writes or something like that.
>
> > "full page records" seems to be showing the number of full page
> > images, not the record having full page images.
> >
>
> I am not sure what exactly is a difference but it is the records
> having full page images. Julien correct me if I am wrong.

This counter should be showing the number of full page image included
in the WAL record(s). The goal is to try to estimate how much FPI are
amplifying WAL records for a given cumulated size of WAL data. I had
seen some pretty high amplification recently due to an autovacuum
freeze. Using pg_waldump to analyze a sample of ~ 100GB of WALs
showed that a 1.5% increase in the number of freeze records lead to
more than 15% increase on the total amount of WAL, due to a high
amount of those records being FPW.

Also note that the patchset Amit is referencing adds the same
instrumentation for vacuum (verbose) and autovacuum, although this
part is showing the intended "full page write" label rather than the
bogus "full page records". Example output:

LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet
removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page writes, 1402679 bytes

VACUUM log sample:

# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443
out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page writes, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page writes, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Obviously previous complaints about the meaning and parsability of
"full page writes" should be addressed here for consistency.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-04-02 06:34:17 Re: Some problems of recovery conflict wait events
Previous Message movead.li@highgo.ca 2020-04-02 06:26:08 Re: A bug when use get_bit() function for a long bytea string