Re: Resetting spilled txn statistics in pg_stat_replication

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-10-13 07:20:42
Message-ID: CA+fd4k59yKGXVTahoez_G_aARPSNks=uQFK-XtDD_LS9tU22qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 13 Oct 2020 at 16:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > The original theory I have given above [1] which is an interleaved
> > > > > autovacumm transaction. Let me try to explain in a bit more detail.
> > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
> > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
> > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen
> > > > > in buildfarm will happen when autovacuum transaction happens after 80%
> > > > > or more of the Insert is done.
> > > > >
> > > > > In such a situation we will start decoding 'Insert' first and need to
> > > > > spill multiple times due to the amount of changes (more than threshold
> > > > > logical_decoding_work_mem) and then before we encounter Commit of
> > > > > transaction that performed Insert (and probably some more changes from
> > > > > that transaction) we will encounter a small transaction (autovacuum
> > > > > transaction). The decode of that small transaction will send the
> > > > > stats collected till now which will lead to the problem shown in
> > > > > buildfarm.
> > > >
> > > > That seems a possible scenario.
> > > >
> > > > I think probably this also explains the reason why spill_count
> > > > slightly varied and spill_txns was still 1. The spill_count value
> > > > depends on how much the process spilled out transactions before
> > > > encountering the commit of an autovacuum transaction. Since we have
> > > > the spill statistics per reorder buffer, not per transactions, it's
> > > > possible.
> > > >
> > >
> > > Okay, here is an updated version (changed some comments) of the patch
> > > I posted some time back. What do you think? I have tested this on both
> > > Windows and Linux environments. I think it is a bit tricky to
> > > reproduce the exact scenario so if you are fine we can push this and
> > > check or let me know if you any better idea?
> >
> > I agree to check if the spill_counts and spill_txns are positive.
> >
>
> I am able to reproduce this problem via debugger. Basically, execute
> the Insert mentioned above from one the psql sessions and in
> ExecInsert() stop the execution once 'estate->es_processed > 4000' and
> then from another psql terminal execute some DDL which will be ignored
> but will any try to decode commit. Then perform 'continue' in the
> first session. This will lead to inconsistent stats value depending
> upon at what time DDL is performed.

Thanks!
I'm also able to reproduce this in a similar way and have confirmed
the patch fixes it.

> I'll push the patch as I am more
> confident now.

+1. Let's check how the tests are going to be.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-10-13 07:48:04 Re: [HACKERS] Runtime Partition Pruning
Previous Message Amit Kapila 2020-10-13 07:15:05 Re: Resetting spilled txn statistics in pg_stat_replication