Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
Date: 2018-08-17 18:07:13
Message-ID: CAPpHfduhw07p+bBp=hvVhdL1EGG6hjyHqJfKUhPXYoxAVcu5Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 17, 2018 at 8:38 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > On Fri, Aug 17, 2018 at 6:41 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> There's another patch, which I thought Alexander was referring to, that
> >> does something a bit smarger. On a super short skim it seems to
> >> introduce a separate type of AEL lock that's not replicated, by my
> >> reading?
>
> > Yes, that's correct. On standby read-only queries can tolerate
> > concurrent heap truncation.
>
> Uh, what???

VACUUM truncates heap relation only after deletion of all the tuples
from tail pages. So, on standby heap truncation record would be
replayed only after heap tuples deletion records are replayed.
Therefore, if query on standby should see some of those tuples, WAL
replay stop (or query cancel) should happened before corresponding
tuples being deleted by our recovery conflict with snapshot logic.
When we're going to replay heap truncation record, no query should see
records in the tail pages.

And in md.c we already have logic to return zeroed pages, when trying
to read past relation in recovery.

/*
* Short read: we are at or past EOF, or we read a partial block at
* EOF. Normally this is an error; upper levels should never try to
* read a nonexistent block. However, if zero_damaged_pages is ON or
* we are InRecovery, we should instead return zeroes without
* complaining. This allows, for example, the case of trying to
* update a block that was later truncated away.
*/
if (zero_damaged_pages || InRecovery)
MemSet(buffer, 0, BLCKSZ);
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("could not read block %u in file \"%s\":
read only %d of %d bytes",
blocknum, FilePathName(v->mdfd_vfd),
nbytes, BLCKSZ)));

But, I'm concerned if FileSeek() could return error. And also what
_mdfd_getseg() would do on truncated segment. It seems that in
recovery, it will automatically extend the relation. That
unacceptable for this purpose. So, sorry for bothering, this patch
definitely needs to be revised.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2018-08-17 18:24:18 Re: [PATCH] Improve geometric types
Previous Message Peter Geoghegan 2018-08-17 17:52:05 Re: Index Skip Scan