Re: Replication slot stats misgivings

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-29 04:25:30
Message-ID: CAA4eK1Li_m6WVkHpcf4437+b1kAg4zbWc90q5ynjWD93Xen5Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 29, 2021 at 8:50 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > This is the first test and inserts just one small record, so how it
> > can lead to spill of data. Do you mean to say that may be some
> > background process has written some transaction which leads to a spill
> > of data?
>
> autovacuum, say?
>
> > Yeah, something like this could happen. Another possibility here could
> > be that before the stats collector has processed drop and create
> > messages, we have enquired about the stats which lead to it giving us
> > the old stats. Note, that we don't wait for 'drop' or 'create' message
> > to be delivered. So, there is a possibility of the same. What do you
> > think?
>
> You should take a close look at the stats test in the main regression
> tests. We had to jump through *high* hoops to get that to be stable,
> and yet it still fails semi-regularly. This looks like pretty much the
> same thing, and so I'm pessimistically inclined to guess that it will
> never be entirely stable.
>

True, it is possible that we can't make it entirely stable but I would
like to try some more before giving up on this. Otherwise, I guess the
other possibility is to remove some of the latest tests added or
probably change them to be more forgiving. For example, we can change
the currently failing test to not check 'spill*' count and rely on
just 'total*' count which will work even in scenarios we discussed for
this failure but it will reduce the efficiency/completeness of the
test case.

> (At least not before the fabled stats collector rewrite, which may well
> introduce some entirely new set of failure modes.)
>
> Do we really need this test in this form? Perhaps it could be converted
> to a TAP test that's a bit more forgiving.
>

We have a TAP test for slot stats but there we are checking some
scenarios across the restart. We can surely move these tests also
there but it is not apparent to me how it can create a difference?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-04-29 04:27:36 Re: WIP: WAL prefetch (another approach)
Previous Message Justin Pryzby 2021-04-29 04:24:04 Re: [PATCH] force_parallel_mode and GUC categories