| From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> | 
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: Wrong check in pg_visibility? | 
| Date: | 2020-12-08 09:59:25 | 
| Message-ID: | ffa00671-029c-eb55-e099-a9031e25c9e4@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 06.12.2020 23:50, Konstantin Knizhnik wrote:
> Hi hackers!
>
> Due to the error in PG-ProEE we have added the following test to 
> pg_visibility:
>
> create table vacuum_test as select 42 i;
> vacuum vacuum_test;
> select count(*) > 0 from pg_check_visible('vacuum_test');
> drop table vacuum_test;
>
> Sometime (very rarely) this test failed: pg_visibility reports 
> "corrupted" tuples.
> The same error can be reproduced not only with PG-Pro but also with 
> vanilla REL_11_STABLE, REL_12_STABLE and REL_13_STABLE.
> It is not reproduced with master after Andres snapshot optimization - 
> commit dc7420c2.
>
> It is not so easy to reproduce this error: it is necessary to repeat 
> this tests many times.
> Probability increased with more aggressive autovacuum settings.
> But even with such settings and thousands of iterations I was not able 
> to reproduce this error at my notebook - only at virtual machine.
>
> The error is reported here:
>
>             /*
>              * If we're checking whether the page is all-visible, we 
> expect
>              * the tuple to be all-visible.
>              */
>             if (check_visible &&
>                 !tuple_all_visible(&tuple, OldestXmin, buffer))
>             {
>                 TransactionId RecomputedOldestXmin;
>
>                 /*
>                  * Time has passed since we computed OldestXmin, so it's
>                  * possible that this tuple is all-visible in reality 
> even
>                  * though it doesn't appear so based on our
>                  * previously-computed value.  Let's compute a new 
> value so we
>                  * can be certain whether there is a problem.
>                  *
>                  * From a concurrency point of view, it sort of sucks to
>                  * retake ProcArrayLock here while we're holding the 
> buffer
>                  * exclusively locked, but it should be safe against
>                  * deadlocks, because surely GetOldestXmin() should 
> never take
>                  * a buffer lock. And this shouldn't happen often, so 
> it's
>                  * worth being careful so as to avoid false positives.
>                  */
>                 RecomputedOldestXmin = GetOldestXmin(NULL, 
> PROCARRAY_FLAGS_VACUUM);
>
>                 if (!TransactionIdPrecedes(OldestXmin, 
> RecomputedOldestXmin))
>                     record_corrupt_item(items, &tuple.t_self);
>
>
> I debugger I have checked that OldestXmin = RecomputedOldestXmin = 
> tuple.t_data->xmin
> I wonder if this check in pg_visibility is really correct and it can 
> not happen that OldestXmin=tuple.t_data->xmin?
> Please notice that tuple_all_visible returns false if 
> !TransactionIdPrecedes(xmin, OldestXmin)
>
I did more investigations and have to say that this check in 
pg_visibility.c is really not correct.
The process which is keeping oldest xmin is autovacuum.
It should be ignored by GetOldestXmin because of PROCARRAY_FLAGS_VACUUM 
flags, but it is not actually skipped
because PROC_IN_VACUUM flag is not set yet. There is yet another flag - 
PROC_IS_AUTOVACUUM
which is always set in autovacuum, but it can not be passed to 
GetOldestXmin? because is cleared by PROCARRAY_PROC_FLAGS_MASK.
If we just repeat RecomputedOldestXmin = GetOldestXmin(NULL, 
PROCARRAY_FLAGS_VACUUM);
several times, then finally we will get right xmin.
I wonder if such check should be excluded from pg_visibility or made in 
more correct way?
Because nothing in documentation of pg_check_visible says that it may 
return false positives:
|pg_check_visible(relation regclass, t_ctid OUT tid) returns setof tid|
         Returns the TIDs of non-all-visible tuples stored in pages
    marked all-visible in the visibility map. If this function returns a
    non-empty set of TIDs, the visibility map is corrupt.
||
And comment to this function is even morefrightening:
/*
  * Return the TIDs of not-all-visible tuples in pages marked all-visible
  * in the visibility map.  We hope no one will ever find any, but there 
could
  * be bugs, database corruption, etc.
  */
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Gilles Darold | 2020-12-08 10:15:12 | [PATCH] Hooks at XactCommand level | 
| Previous Message | Andrey Borodin | 2020-12-08 09:54:55 | Re: Logical archiving |