Re: WAL usage calculation patch

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kirill Bychik <kirill(dot)bychik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: WAL usage calculation patch
Date: 2020-04-02 06:34:32
Message-ID: CAFiTN-tgoCtsACk_4DcqBq0KM1B7X38d7VRTL1r0_DhzQZui1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> Hi,
>
> I'm replying here to all reviews that have been sent, thanks a lot!
>
> On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > >
> > > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> > > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> > > previously mentionned changes.
> > >
> >
> > Few other comments:
> > v9-0003-Add-infrastructure-to-track-WAL-usage
> > 1.
> > static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
> > -
> > +static void WalUsageAdd(WalUsage *dst, WalUsage *add);
> >
> > Looks like a spurious line removal
>
>
> Fixed.
>
>
> > 2.
> > + /* Report a full page imsage constructed for the WAL record */
> > + *num_fpw += 1;
> >
> > Typo. /imsage/image
>
>
> Ah sorry I though I fixed it previously, fixed.
>
>
> > 3. Doing some testing with and without parallelism to ensure WAL usage
> > data is correct would be great and if possible, share the results?
>
>
> I just saw that Dilip did some testing, but just in case here is some
> additional one
>
> - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
> query | calls | wal_bytes | wal_records | wal_num_fpw
> ------------------------+-------+-----------+-------------+-------------
> vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
> vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
> (2 rows)
>
> - create index, overload t1's parallel_workers, using the 1M line just
> vacuumed:
>
> =# alter table t1 set (parallel_workers = 2);
> ALTER TABLE
>
> =# create index t1_parallel_2 on t1(id);
> CREATE INDEX
>
> =# alter table t1 set (parallel_workers = 0);
> ALTER TABLE
>
> =# create index t1_parallel_0 on t1(id);
> CREATE INDEX
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
> query | calls | wal_bytes | wal_records | wal_num_fpw
> --------------------------------------+-------+-----------+-------------+-------------
> create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
> create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
> (2 rows)
>
> It all looks good to me.
>
>
> > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> > 4.
> > +-- SELECT usage data, check WAL usage is reported, wal_records equal
> > rows count for INSERT/UPDATE/DELETE
> > +SELECT query, calls, rows,
> > +wal_bytes > 0 as wal_bytes_generated,
> > +wal_records > 0 as wal_records_generated,
> > +wal_records = rows as wal_records_as_rows
> > +FROM pg_stat_statements ORDER BY query COLLATE "C";
> > + query |
> > calls | rows | wal_bytes_generated | wal_records_generated |
> > wal_records_as_rows
> > +------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
> > + DELETE FROM pgss_test WHERE a > $1 |
> > 1 | 1 | t | t | t
> > + DROP TABLE pgss_test |
> > 1 | 0 | t | t | f
> > + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
> > 1 | 3 | t | t | t
> > + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) |
> > 1 | 10 | t | t | t
> > + SELECT * FROM pgss_test ORDER BY a |
> > 1 | 12 | f | f | f
> > + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a |
> > 2 | 4 | f | f | f
> > + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) |
> > 1 | 8 | f | f | f
> > + SELECT pg_stat_statements_reset() |
> > 1 | 1 | f | f | f
> > + SET pg_stat_statements.track_utility = FALSE |
> > 1 | 0 | f | f | t
> > + UPDATE pgss_test SET b = $1 WHERE a = $2 |
> > 6 | 6 | t | t | t
> > + UPDATE pgss_test SET b = $1 WHERE a > $2 |
> > 1 | 3 | t | t | t
> > +(11 rows)
> > +
> >
> > I am not sure if the above tests make much sense as they are just
> > testing that if WAL is generated for these commands. I understand it
> > is not easy to make these tests reliable but in that case, we can
> > think of some simple tests. It seems to me that the difficulty is due
> > to full_page_writes as that depends on the checkpoint. Can we make
> > full_page_writes = off for these tests and check some simple
> > Insert/Update/Delete cases? Alternatively, if you can present the
> > reason why that is unstable or are tricky to write, then we can simply
> > get rid of these tests because I don't see tests for BufferUsage. Let
> > not write tests for the sake of writing it unless they can detect bugs
> > in the future or are meaningfully covering the new code added.
>
>
> I don't think that we can have any hope in a stable amount of WAL bytes
> generated, so testing a positive number looks sensible to me. Then testing
> that each 1-line-write query generates a WAL record also looks sensible, so I
> kept this. I realized that Kirill used an existing set of queries that were
> previously added to validate the multi queries commands behavior, so there's no
> need to have all of them again. I just kept one of each (insert, update,
> delete, select) to make sure that we do record WAL activity there, but I don't
> think that more can really be done. I still think that this is better than
> nothing, but if you disagree feel free to drop those tests.
>
>
> > 5.
> > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> > - query | calls | rows
> > ------------------------------------+-------+------
> > - SELECT $1::TEXT | 1 | 1
> > - SELECT PLUS_ONE($1) | 2 | 2
> > - SELECT PLUS_TWO($1) | 2 | 2
> > - SELECT pg_stat_statements_reset() | 1 | 1
> > +SELECT query, calls, rows, wal_bytes, wal_records FROM
> > pg_stat_statements ORDER BY query COLLATE "C";
> > + query | calls | rows | wal_bytes | wal_records
> > +-----------------------------------+-------+------+-----------+-------------
> > + SELECT $1::TEXT | 1 | 1 | 0 | 0
> > + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
> > + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
> > + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
> > (4 rows)
> >
> > Again, I am not sure if these modifications make much sense?
>
>
> Those are queries that were previously executed. As those are read-only query,
> that are pretty much guaranteed to not cause any WAL activity, I don't see how
> it hurts to test at the same time that that's we indeed record with
> pg_stat_statements, just to be safe. Once again, feel free to drop the extra
> wal_* columns from the output if you disagree.
>
>
> > 6.
> > static void pgss_shmem_startup(void);
> > @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
> > int query_location, int query_len,
> > double total_time, uint64 rows,
> > const BufferUsage *bufusage,
> > + const WalUsage* walusage,
> > pgssJumbleState *jstate);
> >
> > The alignment for walusage doesn't seem to be correct. Running
> > pgindent will fix this.
>
>
> Indeed, fixed.
>
> > 7.
> > + values[i++] = Int64GetDatumFast(tmp.wal_records);
> > + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
> >
> > Why are they different? I think we should use the same *GetDatum API
> > (probably Int64GetDatumFast) for these.
>
>
> Oops, that's a mistake from when I was working on the wal_bytes output, fixed.
>
> > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> > >
> >
> > One more comment related to this patch.
> > +
> > + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
> > +
> > + /* Convert to numeric. */
> > + wal_bytes = DirectFunctionCall3(numeric_in,
> > + CStringGetDatum(buf),
> > + ObjectIdGetDatum(0),
> > + Int32GetDatum(-1));
> > +
> > + values[i++] = wal_bytes;
> >
> > I see that other places that display uint64 values use BIGINT datatype
> > in SQL, so why can't we do the same here? See the usage of queryid in
> > pg_stat_statements or internal_pages, *_pages exposed via
> > pgstatindex.c.
>
>
> That's because it's harmless to report a signed number for a hash (at least
> comapred to the overhead of having it unsigned), while that's certainly not
> wanted to report a negative amount of WAL bytes generated if it goes beyond
> bigint limit. See the usage of pg_lsn_mi in pg_lsn.c for instance.
>
> On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
> >
> > I have reviewed 0003 and 0004, I have a few comments.
> > v9-0003-Add-infrastructure-to-track-WAL-usage
> >
> > 1.
> > /* Points to buffer usage area in DSM */
> > BufferUsage *buffer_usage;
> > + /* Points to WAL usage area in DSM */
> > + WalUsage *wal_usage;
> >
> > Better to give one blank line between the previous statement/variable
> > declaration and the next comment line.
>
>
> Fixed.
>
>
> > 2.
> > @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > IndexBulkDeleteResult **stats,
> > WaitForParallelWorkersToFinish(lps->pcxt);
> >
> > for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > - InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
> > }
> >
> > The existing comment above this loop, which just mentions the buffer
> > usage, not the wal usage so I guess we need to change that.
>
>
> Ah indeed, I thought I caught all the comments but missed this one. Fixed.
>
>
> > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
> >
> > 3.
> > + if (usage->wal_num_fpw > 0)
> > + appendStringInfo(es->str, " full page records=%ld",
> > + usage->wal_num_fpw);
> > + if (usage->wal_bytes > 0)
> > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> > + usage->wal_bytes);
> >
> > Shall we change to 'full page writes' or 'full page image' instead of
> > full page records?
>
>
> Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed.
>
>
> > Apart from this, I have some testing to see the wal_usage with the
> > parallel vacuum and the results look fine.
> >
> > postgres[104248]=# CREATE TABLE test (a int, b int);
> > CREATE TABLE
> > postgres[104248]=# INSERT INTO test SELECT i, i FROM
> > GENERATE_SERIES(1,2000000) as i;
> > INSERT 0 2000000
> > postgres[104248]=# CREATE INDEX idx1 on test(a);
> > CREATE INDEX
> > postgres[104248]=# VACUUM (PARALLEL 1) test;
> > VACUUM
> > postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
> > from pg_stat_statements where query like 'VACUUM%';
> > query | wal_bytes | wal_records | wal_num_fpw
> > --------------------------+-----------+-------------+-------------
> > VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855
> >
> >
> >
> > postgres[106479]=# CREATE TABLE test (a int, b int);
> > CREATE TABLE
> > postgres[106479]=# INSERT INTO test SELECT i, i FROM
> > GENERATE_SERIES(1,2000000) as i;
> > INSERT 0 2000000
> > postgres[106479]=# CREATE INDEX idx1 on test(a);
> > CREATE INDEX
> > postgres[106479]=# VACUUM (PARALLEL 0) test;
> > VACUUM
> > postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
> > from pg_stat_statements where query like 'VACUUM%';
> > query | wal_bytes | wal_records | wal_num_fpw
> > --------------------------+-----------+-------------+-------------
> > VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855
>
>
> Thanks! I did some similar testing, with also seq/parallel index creation and
> got similar results.
>
>
> > By tomorrow, I will try to finish reviewing 0005 and 0006.

I have reviewed these patches and I have a few cosmetic comments.
v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements

1.
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_num_fpw; /* # of full page wal records written */

/s/# of full page wal records written / /* # of WAL full page image produced */

2.
static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
ProcessUtilityContext context, ParamListInfo params,
QueryEnvironment *queryEnv,
- DestReceiver *dest, QueryCompletion *qc);
+ DestReceiver *dest, QueryCompletion * qc);

Useless hunk.

3.

v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum

@@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page records", NULL,
+ ExplainPropertyInteger("WAL full page writes", NULL,
usage->wal_num_fpw, es);
Just noticed that in 0004 you have first added "WAL full page
records", which is later corrected to "WAL full page writes" in 0006.
I think we better keep this proper in 0004 itself and avoid this hunk
in 0006, otherwise, it creates confusion while reviewing.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-04-02 06:35:40 Re: User Interface for WAL usage data
Previous Message Fujii Masao 2020-04-02 06:34:17 Re: Some problems of recovery conflict wait events