Re: New vacuum option to do only freezing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New vacuum option to do only freezing
Date: 2019-04-15 17:13:01
Message-ID: 16814.1555348381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
>> Ugh, I think the assertion is right but the above condition is
>> completely wrong. We should increment nleft_dead_tuples when index
>> cleanup is *not* enabled.

> Here is a draft patch to fix this issue.

So the real issue here, I fear, is that we've got no consistent testing
of the whole case block for HEAPTUPLE_DEAD, as you can easily confirm
by checking the code coverage report at
https://coverage.postgresql.org/src/backend/access/heap/vacuumlazy.c.gcov.html

This is perhaps unsurprising, given the code comment that points out that
we can only reach that block if the tuple's state changed since
heap_page_prune() a few lines above. Still, it means that this patch
hasn't been tested in that scenario, until we were lucky enough for
a slow buildfarm machine like topminnow to hit it.

What's more, because that block is the only way for "tupgone" to be
set, we also don't reach the "if (tupgone)" block at lines 1183ff.
And I think this patch has probably broken that, too. Surely, if we
are not going to remove the tuple, we should not increment tups_vacuumed?
And maybe we have to freeze it instead? How is it that we can, or should,
treat this situation as different from a dead-but-not-removable tuple?

I have a very strong feeling that this patch was not fully baked.

BTW, I can reproduce the crash fairly easily (within a few cycles
of the regression tests) by (a) inserting pg_usleep(10000) after
the heap_page_prune call, and (b) making autovacuum much more
aggressive.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-15 17:22:30 Autovacuum-induced regression test instability
Previous Message Pavel Stehule 2019-04-15 17:00:52 Re: plpgsql - execute - cannot use a reference to record field