Re: Why we lost Uber as a user

From: Alfred Perlstein <alfred(at)freebsd(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why we lost Uber as a user
Date: 2016-08-03 02:30:22
Message-ID: 39886b9a-6ff2-e48e-975a-4c7a7a2418c7@freebsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/2/16 2:14 PM, Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> With physical replication, there is the concern that a bug in *just* the
>> physical (WAL) side of things could cause corruption.
> Right. But with logical replication, there's the same risk that the
> master's state could be fine but a replication bug creates corruption on
> the slave.
>
> Assuming that the logical replication works by issuing valid SQL commands
> to the slave, one could hope that this sort of "corruption" only extends
> to having valid data on the slave that fails to match the master.
> But that's still not a good state to be in. And to the extent that
> performance concerns lead the implementation to bypass some levels of the
> SQL engine, you can easily lose that guarantee too.
>
> In short, I think Uber's position that logical replication is somehow more
> reliable than physical is just wishful thinking. If anything, my money
> would be on the other way around: there's a lot less mechanism that can go
> wrong in physical replication. Which is not to say there aren't good
> reasons to use logical replication; I just do not believe that one.
>
> regards, tom lane
>
>
The reason it can be less catastrophic is that for logical replication
you may futz up your data, but you are safe from corrupting your entire
db. Meaning if an update is missed or doubled that may be addressed by
a fixup SQL stmt, however if the replication causes a write to the
entirely wrong place in the db file then you need to "fsck" your db and
hope that nothing super critical was blown away.

The impact across a cluster is potentially magnified by physical
replication.

So for instance, let's say there is a bug in the master's write to
disk. The logical replication acts as a barrier from that bad write
going to the slaves. With bad writes going to slaves then any
corruption experienced on the master will quickly reach the slaves and
they too will be corrupted.

With logical replication a bug may be stopped at the replication layer.
At that point you can resync the slave from the master.

Now in the case of physical replication all your base are belong to zuul
and you are in a very bad state.

That said with logical replication, who's to say that if the statement
is replicated to a slave that the slave won't experience the same bug
and also corrupt itself.

We may be saying the same thing, but still there is something to be said
for logical replication... also, didnt they show that logical
replication was faster for some use cases at Uber?

-Alfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-08-03 02:33:15 Re: Why we lost Uber as a user
Previous Message Tomas Vondra 2016-08-03 01:58:13 Re: multivariate statistics (v19)