Re: Synchronous replay take III

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous replay take III
Date: 2018-11-15 05:30:06
Message-ID: CAD21AoDN3oy06rMfr2e7tqJHNWH4cithHou7EKx07i21BVHbFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 1, 2018 at 10:40 AM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>
> Hi hackers,
>
> I was pinged off-list by a fellow -hackers denizen interested in the
> synchronous replay feature and wanting a rebased patch to test. Here
> it goes, just in time for a Commitfest. Please skip to the bottom of
> this message for testing notes.

Thank you for working on this. The overview and your summary was
helpful for me to understand this feature, thank you. I've started to
review this patch for PostgreSQL 12. I've tested this patch and found
some issue but let me ask you questions about the high-level design
first. Sorry if these have been already discussed.

>
> In previous threads[1][2][3] I called this feature proposal "causal
> reads". That was a terrible name, borrowed from MySQL. While it is
> probably a useful term of art, for one thing people kept reading it as
> "casual", which it ain't, and more importantly this patch is only one
> way to achieve read-follows-write causal consistency. Several others
> are proposed or exist in forks (user managed wait-for-LSN, global
> transaction manager, ...).
>
> OVERVIEW
>
> For writers, it works a bit like RAID mirroring: when you commit a
> write transaction, it waits until the data has become visible on all
> elements of the array, and if an array element is not responding fast
> enough it is kicked out of the array. For readers, it's a little
> different because you're connected directly to the array elements
> (rather than going through a central controller), so it uses a system
> of leases allowing read transactions to know instantly and whether
> they are running on an element that is currently in the array and are
> therefore able to service synchronous_replay transactions, or should
> raise an error telling you to go and ask some other element.
>
> This is a design choice favouring read-mostly workloads at the expense
> of write transactions. Hot standbys' whole raison for existing is to
> move *some* read-only workloads off the primary server. This proposal
> is for users who are prepared to trade increased primary commit
> latency for a guarantee about visibility on the standbys, so that
> *all* read-only work could be moved to hot standbys.

To be clear what did you mean read-mostly workloads?

I think there are two kind of reads on standbys: a read happend after
writes and a directly read (e.g. reporting). The former usually
requires the causal reads as you mentioned in order to read its own
writes but the latter might be different: it often wants to read the
latest data on the master at the time. IIUC even if we send a
read-only query directly to a synchronous replay server we could get a
stale result if the standby delayed for less than
synchronous_replay_max_lag. So this synchronous replay feature would
be helpful for the former case(i.e. a few writes and many reads wants
to see them) whereas for the latter case perhaps the keeping the reads
waiting on standby seems a reasonable solution.

Also I think it's worth to consider the cost both causal reads *and*
non-causal reads.

I've considered a mixed workload (transactions requiring causal reads
and transactions not requiring it) on the current design. IIUC the
current design seems like that we create something like
consistent-reads group by specifying servers. For example, if a
transaction doesn't want to causality read it can send query any
server with synchronous_replay = off but if it wants, it should select
a synchronous replay server. It also means that client applications or
routing middlewares such as pgpool is required to be aware of
available synchronous replay standbys. That is, this design would cost
the read-only transactions requiring causal reads. On the other hand,
in token-based causal reads we can send read-only query any standbys
if we can wait for the change to be replayed. Of course if we don't
wait forever we can timeout and switch to either another standby or
the master to execute query but we don't need to choose a server of
standby servers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-11-15 05:38:30 Re: ATTACH/DETACH PARTITION CONCURRENTLY
Previous Message myungkyu.lim 2018-11-15 05:10:21 RE: [Todo item] Add entry creation timestamp column to pg_stat_replication