Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
Date: 2020-03-17 19:30:14
Message-ID: CAH2-WznQ_-CkUa9z-Ccq4FUw6aDUZOuy1i7B9rQZVwAL-oT7Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev
<michail(dot)nikolaev(at)gmail(dot)com> wrote:
> ------ ABSTRACT ------
> There is a race condition between btree_xlog_unlink_page and _bt_walk_left.
> A lot of versions are affected including 12 and new-coming 13.
> Happens only on standby. Seems like could not cause invalid query results.

(CC'ing Heikki, just in case.)

Good catch! I haven't tried to reproduce the problem here just yet,
but your explanation is very easy for me to believe.

As you pointed out, the best solution is likely to involve having the
standby imitate the buffer lock acquisitions that take place on the
primary. We don't do that for page splits and page deletions. I think
that it's okay in the case of page splits, since we're only failing to
perform the same bottom-up lock coupling (I added something about that
specific thing to the README recently). Even btree_xlog_unlink_page()
would probably be safe if we didn't have to worry about backwards
scans, which are really a special case. But we do.

FWIW, while I agree that this issue is more likely to occur due to the
effects of commit 558a9165, especially when running your test case, my
own work on B-Tree indexes for Postgres 12 might also be a factor. I
won't get into the reasons now, since they're very subtle, but I have
observed that the Postgres 12 work tends to make page deletion occur
far more frequently with certain workloads. This was really obvious
when I examined the structure of B-Tree indexes over many hours while
BenchmarkSQL/TPC-C [1] ran, for example.

[1] https://github.com/petergeoghegan/benchmarksql
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-03-17 19:37:57 Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
Previous Message Kirill Bychik 2020-03-17 19:27:05 Re: WAL usage calculation patch