Quick Links

Re: Why we lost Uber as a user

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Alfred Perlstein <alfred(at)freebsd(dot)org>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why we lost Uber as a user
Date:	2016-08-19 14:08:25
Message-ID:	CAHyXU0zkmRt2hi6Y9q1MWYHCYmx4XxO5k9oS7yN6Y-Qq5vZzjg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 17, 2016 at 5:18 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 8/17/16 2:51 PM, Simon Riggs wrote:
>> On 17 August 2016 at 12:19, Greg Stark <stark(at)mit(dot)edu> wrote:
>>> Yes, this is exactly what it should be doing and exactly why it's
>>> useful. Physical replication accurately replicates the data from the
>>> master including "corruption" whereas a logical replication system
>>> will not, causing divergence and possible issues during a failover.
>>
>>
>> Yay! Completely agree.
>>
>> Physical replication, as used by DRBD and all other block-level HA
>> solutions, and also used by other databases, such as Oracle.
>>
>> Corruption on the master would often cause errors that would prevent
>> writes and therefore those changes wouldn't even be made, let alone be
>> replicated.
>
>
> My experience has been that you discover corruption after it's already
> safely on disk, and more than once I've been able to recover by using data
> on a londiste replica.
>
> As I said originally, it's critical to understand the different solutions
> and the pros and cons of each. There is no magic bullet.

Data point: in the half or so cases I've experienced corruption on
replicated systems, in all cases but one the standby was clean. The
'unclean' case actually 8.2 warm standby; the source of the corruption
was a very significant bug where prepared statements would write back
corrupted data if the table definitions changed under the statement
(fixed in 8.3). In that particular case the corruption was very
unfortunately quite widespread and passed directly along to the
standby server. This bug nearly costed us a user as well although not
nearly so famous as uber :-).

In the few modern cases I've seen I've not been able to trace it back
to any bug in postgres (in particular multixact was ruled out) and
I've chalked it up to media or (more likely I think) filesystem
problems in the face of a -9 reset.

merlin

In response to

Re: Why we lost Uber as a user at 2016-08-17 22:18:05 from Jim Nasby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joshua Bay	2016-08-19 14:16:35	Re: Most efficient way for libPQ .. PGresult serialization
Previous Message	Kevin Grittner	2016-08-19 14:00:15	Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location