Re: WAL usage calculation patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kirill Bychik <kirill(dot)bychik(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: WAL usage calculation patch
Date: 2020-04-01 10:59:16
Message-ID: CAA4eK1KxT5ckuaiQqhUsNn=S3yQsxgC5aeiUoXxnS-i2W4K+_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> previously mentionned changes.
>

Few other comments:
v9-0003-Add-infrastructure-to-track-WAL-usage
1.
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
-
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);

Looks like a spurious line removal

2.
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;

Typo. /imsage/image

3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?

v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
4.
+-- SELECT usage data, check WAL usage is reported, wal_records equal
rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query |
calls | rows | wal_bytes_generated | wal_records_generated |
wal_records_as_rows
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 |
1 | 1 | t | t | t
+ DROP TABLE pgss_test |
1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) |
1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a |
1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a |
2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) |
1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() |
1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE |
1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 |
6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 |
1 | 3 | t | t | t
+(11 rows)
+

I am not sure if the above tests make much sense as they are just
testing that if WAL is generated for these commands. I understand it
is not easy to make these tests reliable but in that case, we can
think of some simple tests. It seems to me that the difficulty is due
to full_page_writes as that depends on the checkpoint. Can we make
full_page_writes = off for these tests and check some simple
Insert/Update/Delete cases? Alternatively, if you can present the
reason why that is unstable or are tricky to write, then we can simply
get rid of these tests because I don't see tests for BufferUsage. Let
not write tests for the sake of writing it unless they can detect bugs
in the future or are meaningfully covering the new code added.

5.
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM
pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)

Again, I am not sure if these modifications make much sense?

6.
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);

The alignment for walusage doesn't seem to be correct. Running
pgindent will fix this.

7.
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_num_fpw);

Why are they different? I think we should use the same *GetDatum API
(probably Int64GetDatumFast) for these.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-04-01 11:30:59 Re: WAL usage calculation patch
Previous Message Fujii Masao 2020-04-01 10:37:15 Re: recovery_target_action=pause with confusing hint