Re: Parallel Append subplan order instability on aye-aye

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append subplan order instability on aye-aye
Date: 2019-07-17 15:53:48
Message-ID: 22315.1563378828@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> Surely it can't be that since that just sets what *pages gets set to.
> Tom mentioned that following was returning 0 pages and tuples:

> -- Temporary hack to investigate whether extra vacuum/analyze is happening
> select relname, relpages, reltuples
> from pg_class
> where relname like '__star' order by relname;
> relname | relpages | reltuples
> ---------+----------+-----------
> a_star | 1 | 3

I poked around a little and came up with a much simpler theory:
VACUUM will not change relpages/reltuples if it does not scan any pages
(cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343
in HEAD's vacuumlazy.c). And, because sanity_check.sql's VACUUM is a
plain unaggressive vacuum, all that it takes to make it skip over a_star's
one page is for somebody else to have a pin on that page. So a chance
collision with the bgwriter or checkpointer could cause the observed
symptom, not just for a_star but for the other single-page relations that
are at stake here. Those pages are certainly dirty after create_misc.sql,
so it's hardly implausible for one of these processes to be holding pin
while trying to write out the buffer at the time sanity_check.sql runs.

A brute-force way to fix this (or at least reduce the odds quite a bit)
would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
thereby guaranteeing that none of these pages are still in need of being
written. Not sure how much that'd penalize the regression tests' runtime,
or whether we'd have a loss of test coverage of VACUUM behaviors.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-17 16:32:39 sepgsql seems rather thoroughly broken on Fedora 30
Previous Message r.zharkov 2019-07-17 14:54:16 Re: Intermittent pg_ctl failures on Windows