Re: Replication slot stats misgivings

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-29 12:44:06
Message-ID: CALDaNm2DYF2o_YNbCftWF-iaFe3ry+Ba6HyatNzfm-QfjiOLXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 29, 2021 at 3:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Apr 28, 2021 at 5:01 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> > >
> > > On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > @@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
> > ReorderBufferIterTXNState *state)
> > * Update the total bytes processed before releasing the current set
> > * of changes and restoring the new set of changes.
> > */
> > - rb->totalBytes += rb->size;
> > + rb->totalBytes += entry->txn->total_size;
> > if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
> > &state->entries[off].segno))
> >
> > I have not tested this but won't in the above change you need to check
> > txn->toptxn for subtxns?
> >
>
> Now, I am able to reproduce this issue:
> Create table t1(c1 int);
> select pg_create_logical_replication_slot('s', 'test_decoding');
> Begin;
> insert into t1 values(1);
> savepoint s1;
> insert into t1 select generate_series(1, 100000);
> commit;
>
> postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL,
NULL);
> count
> --------
> 100005
> (1 row)
>
> postgres=# select * from pg_stat_replication_slots;
> slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
> stream_count | stream_bytes | total_txns | total_bytes |
> stats_reset
>
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
> s1 | 0 | 0 | 0 | 0 |
> 0 | 0 | 2 | 13200672 | 2021-04-29
> 14:33:55.156566+05:30
> (1 row)
>
> select * from pg_stat_reset_replication_slot('s1');
>
> Now reduce the logical decoding work mem to allow spilling.
> postgres=# set logical_decoding_work_mem='64kB';
> SET
> postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL,
NULL);
> count
> --------
> 100005
> (1 row)
>
> postgres=# select * from pg_stat_replication_slots;
> slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
> stream_count | stream_bytes | total_txns | total_bytes |
> stats_reset
>
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
> s1 | 1 | 202 | 13200000 | 0 |
> 0 | 0 | 2 | 672 | 2021-04-29
> 14:35:21.836613+05:30
> (1 row)
>
> You can notice that after we have allowed spilling the 'total_bytes'
> stats is showing a different value. The attached patch fixes the issue
> for me. Let me know what do you think about this?

I found one issue with the following scenario when testing with
logical_decoding_work_mem as 64kB:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;
SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 |
0 | 0 | 1 | *4262016* | 2021-04-29
17:50:00.080663+05:30
(1 row)

Same thing works fine with logical_decoding_work_mem as 64MB:
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 |
0 | 0 | 1 | *2640000* | 2021-04-29
17:50:00.080663+05:30
(1 row)

The patch required one change:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->*total_size*;

The above should be changed to:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->*size*;

Attached patch fixes the issue.
Thoughts?

Regards,
Vignesh

Attachment Content-Type Size
use_total_size_v4.patch text/x-patch 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaibhav Dalvi 2021-04-29 13:05:28 Remove post-increment in function quote_identifier of pg_upgrade
Previous Message Amit Kapila 2021-04-29 11:53:57 Re: [BUG] "FailedAssertion" reported when streaming in logical replication