Re: Why we lost Uber as a user

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Alfred Perlstein <alfred(at)freebsd(dot)org>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why we lost Uber as a user
Date: 2016-08-17 05:27:18
Message-ID: CAMsr+YFXG_Y8gnhXd2_FLvpqRBLV0LTHYFHcKvfWg8rt_Yv-iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 August 2016 at 08:36, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> Something I didn't see mentioned that I think is a critical point: last I
> looked, HOT standby (and presumably SR) replays full page writes.

Yes, that's right, all WAL-based physical replication replays FPWs.

We could, at the cost of increased WAL size, retain both the original WAL
buffer that triggered the FPW and the FPW page image. That's what wal_level
= logical does in some cases. I'm not sure it's that compelling though, it
just introduces another redo path that can go wrong.

> Ultimately, people really need to understand the trade-offs to the
> different solutions so they can make an informed decision on which ones
> (yes, plural) they want to use. The same can be said about pg_upgrade vs
> something else, and the different ways of doing backups.
>

Right.

It's really bugging me that people are talking about "statement based"
replication in MySQL as if it's just sending SQL text around. MySQL's
statemnet based replication is a lot smarter than that, and in the
actually-works-properly form it's a hybrid of row and statement based
replication ("MIXED" mode). As I understand it it lobs around something
closer to parsetrees with some values pre-computed rather than SQL text
where possible. It stores some computed values of volatile functions in the
binlog and reads them from there rather than computing them again when
running the statement on replicas, which is why AUTO_INCREMENT etc works.
It also falls back to row based replication where necessary for
correctness. Even then it has a significant list of caveats, but it's
pretty damn impressive. I didn't realise how clever the hybrid system was
until recently.

I can see it being desirable to do something like that eventually as an
optimisation to logical decoding based replication. Where we can show that
the statement is safe or make it safe by doing things like evaluating and
substituting volatile function calls, xlog a modified parsetree with oids
changed to qualified object names etc, send that when decoding, and execute
that on the downstream(s). If there's something we can't show to be safe
then replay the logical rows instead. That's way down the track though; I
think it's more important to focus on completing logical row-based
replication to the point where we handle table rewrites seamlessly and it
"just works" first.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-08-17 05:33:36 Re: Declarative partitioning - another take
Previous Message Craig Ringer 2016-08-17 05:16:24 Re: [GENERAL] C++ port of Postgres