Re: Thoughts on "killed tuples" index hint bits support on standby

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on "killed tuples" index hint bits support on standby
Date: 2020-01-24 14:15:47
Message-ID: CANtu0og+1g+9bvEr1LA5WPme5HAMhHmz-zA=kxtd=dURfaKH_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello again.

Andres, Peter, thanks for your comments.

Some of issues your mentioned (reporting feedback to the another
cascade standby, processing queries after restart and newer xid
already reported) could be fixed in provided design, but your
intention to have "independent correctness backstop" is a right thing
to do.

So, I was thinking about another approach which is:
* still not too tricky to implement
* easy to understand
* does not rely on hot_standby_feedback for correctness, but only for efficiency
* could be used with any kind of index
* does not generate a lot of WAL

Let's add a new type of WAL record like "some index killed tuple hint
bits are set according to RecentGlobalXmin=x" (without specifying page
or even relation). Let's call 'x' as 'LastKilledIndexTuplesXmin' and
track it in standby memory. It is sent only in case of
wal_log_hints=true. If hints cause FPW - it is sent before FPW record.
Also, it is not required to write such WAL every time primary marks
index tuple as dead. It should be done only in case
'LastKilledIndexTuplesXmin' is changed (moved forward).

On standby such record is used to cancel queries. If transaction is
executed with "ignore_killed_tuples==true" (set on snapshot creation)
and its xid is less than received LastKilledIndexTuplesXmin - just
cancel the query (because it could rely on invalid hint bit). So,
technically it should be correct to use hints received from master to
skip tuples according to MVCC, but "the conflict rate goes through the
roof".

To avoid any real conflicts standby sets
ignore_killed_tuples = (hot_standby_feedback is on)
AND (wal_log_hints is on on primary)
AND (standby new snapshot xid >= last
LastKilledIndexTuplesXmin received)
AND (hot_standby_feedback is reported
directly to master).

So, hot_standby_feedback loop effectively eliminates any conflicts
(because LastKilledIndexTuplesXmin is technically RecentGlobalXmin in
such case). But if feedback is broken for some reason - query
cancellation logic will keep everything safe.

For correctness LastKilledIndexTuplesXmin (and as consequence
RecentGlobalXmin) should be moved only forward.

To set killed bits on standby we should check tuples visibility
according to last LastKilledIndexTuplesXmin received. It is just like
master sets these bits according to its state - so it is even safe to
transfer them to another standby.

Does it look better now?

Thanks, Michail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-01-24 14:17:31 Re: making the backend's json parser work in frontend code
Previous Message Konstantin Knizhnik 2020-01-24 13:17:17 Re: [Proposal] Global temporary tables