Re: Resetting spilled txn statistics in pg_stat_replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-10-13 04:24:17
Message-ID: CAA4eK1KkUKpr7wAO7GT0OVW=iR5zOuEf_Yd+WoaZz6azj78ekA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 13, 2020 at 9:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > I have pushed this but it failed in one of the BF. See
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25
> > The failure is shown below and I am analyzing it. See, if you can
> > provide any insights.
>
> It's not very clear what spill_count actually counts (and the
> documentation sure does nothing to clarify that), but if it has anything
> to do with WAL volume, the explanation might be that florican is 32-bit.
> All the animals that have passed that test so far are 64-bit.
>

It is based on the size of the change. In this case, it is the size of
the tuples inserted. See ReorderBufferChangeSize() know how we compute
the size of each change. Once the total_size for changes crosses
logical_decoding_work_mem (64kB) in this case, we will spill. So
'spill_count' is the number of times the size of changes in that
transaction crossed the threshold and which lead to a spill of the
corresponding changes.

> > The reason for this problem could be that there is some transaction
> > (say by autovacuum) which happened interleaved with this transaction
> > and committed before this one.
>
> I can believe that idea too, but would it not have resulted in a
> diff in spill_txns as well?
>

We count that 'spill_txns' once for a transaction that is ever
spilled. I think the 'spill_txns' wouldn't vary for this particular
test even if the autovacuum transaction happens-before the main
transaction of the test because in that case, wait_for_decode_stats
won't finish until it sees the main transaction ('spill_txns' won't be
positive by that time)

> In short, I'm not real convinced that a stable result is possible in this
> test. Maybe you should just test for spill_txns and spill_count being
> positive.
>

Yeah, that seems like the best we can do here.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-10-13 04:32:29 Re: partition routing layering in nodeModifyTable.c
Previous Message Tom Lane 2020-10-13 03:55:05 Re: Resetting spilled txn statistics in pg_stat_replication