Re: 15beta1 test failure on mips in isolation/expected/stats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, myon(at)debian(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: 15beta1 test failure on mips in isolation/expected/stats
Date: 2022-05-22 22:29:30
Message-ID: 20220522222930.ysmxrukpilyeseel@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-05-20 01:25:10 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
> >> There's some fallout in the expected-file, of course, but this
> >> does seem to fix it (20 consecutive successful runs now at
> >> 100/2). Don't see why though ...
>
> > I think what might be happening is that the transactional stats updates get
> > reported by s2 *before* the non-transactional stats updates come in from
> > s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> > report, because the machine is slow enough for it to be "time to reports stats
> > again". Then s1 reports its non-transactional stats.
>
> Sounds plausible. And I left the test loop running, and it's now past
> 100 consecutive successes, so I think this change definitely "fixes" it.

FWIW, the problem can be reliably reproduced by sticking a
pgstat_force_next_flush() into pgstat_twophase_postcommit(). This is the only
failure when doing so.

> > It looks like our stats maintenance around truncation isn't quite "concurrency
> > safe". That code hasn't meaningfully changed, but it'd not be surprising if
> > it's not 100% precise...
>
> Yeah. Probably not something to try to improve post-beta, especially
> since it's not completely clear how transactional and non-transactional
> cases *should* interact.

Yea. It's also not normally particularly crucial to be accurate down to that
degree.

> Maybe non-transactional updates should be
> pushed immediately? But I'm not sure if that's fully correct, and
> it definitely sounds expensive.

I think that'd be far too expensive - the majority of stats are
non-transactional...

I think what we could do is to model truncates as subtracting the number of
live/dead rows the truncating backend knows about, rather than setting them to
0. But that of course could incur other inaccuracies.

> I'd be good with tweaking this test case as you suggest, and maybe
> revisiting the topic later.

Pushed the change of the test. Christoph, just to make sure, can you confirm
that this fixes the test instability for you?

> Kyotaro-san worried about whether any other places in stats.spec
> have the same issue. I've not seen any evidence of that in my
> tests, but perhaps some other machine with different timing
> could find it.

I tried to find some by putting in forced flushes in a bunch of places before,
and now some more, without finding further cases.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-05-22 23:26:06 ccache, MSVC, and meson
Previous Message Nathan Bossart 2022-05-22 20:26:08 Re: docs: mention "pg_read_all_stats" in "track_activities" description