Re: Why we lost Uber as a user

From: Torsten Zuehlsdorff <mailinglists(at)toco-domains(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Alfred Perlstein <alfred(at)freebsd(dot)org>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why we lost Uber as a user
Date: 2016-08-04 09:00:25
Message-ID: cee2b6a3-66c1-bf62-ca59-41638bb4e266@toco-domains.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.08.2016 21:05, Robert Haas wrote:
> On Wed, Aug 3, 2016 at 2:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I don't think they are saying that logical replication is more
>>> reliable than physical replication, nor do I believe that to be true.
>>> I think they are saying that if logical corruption happens, you can
>>> fix it by typing SQL statements to UPDATE, INSERT, or DELETE the
>>> affected rows, whereas if physical corruption happens, there's no
>>> equally clear path to recovery.
>>
>> Well, that's not an entirely unreasonable point, but I dispute the
>> implication that it makes recovery from corruption an easy thing to do.
>> How are you going to know what SQL statements to issue? If the master
>> database is changing 24x7, how are you going to keep up with that?
>
> I think in many cases people fix their data using business logic. For
> example, suppose your database goes down and you have to run
> pg_resetxlog to get it back up. You dump-and-restore, as one does,
> and find that you can't rebuild one of your unique indexes because
> there are now two records with that same PK. Well, what you do is you
> look at them and judge which one has the correct data, often the one
> that looks more complete or the one with the newer timestamp. Or,
> maybe you need to merge them somehow. In my experience helping users
> through problems of this type, once you explain the problem to the
> user and tell them they have to square it on their end, the support
> call ends. The user may not always be entirely thrilled about having
> to, say, validate a problematic record against external sources of
> truth, but they usually know how to do it. Database bugs aren't the
> only way that databases become inaccurate. If the database that they
> use to keep track of land ownership in the jurisdiction where I live
> says that two different people own the same piece of property,
> somewhere there is a paper deed in a filing cabinet. Fishing that out
> to understand what happened may not be fun, but a DBA can explain that
> problem to other people in the organization and those people can get
> it fixed. It's a problem, but it's fixable.
>
> On the other hand, if a heap tuple contains invalid infomask bits that
> cause an error every time you read the page (this actually happened to
> an EnterpriseDB customer!), the DBA can't tell other people how to fix
> it and can't fix it personally either. Instead, the DBA calls me.

After reading this statement the ZFS filesystem pops into my mind. It
has protection build in against various problems (data degradation,
current spikes, phantom writes, etc).

For me this raises two questions:

1) would the usage of ZFS prevent such errors?

My feeling would say yes, but i have no idea about how a invalid
infomask bit could occur.

2) would it be possible to add such prevention to PostgreSQL

I know this could add a massive overhead, but it its optional this could
be a fine thing?

Greetings,
Torsten

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2016-08-04 09:03:44 Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Previous Message Anastasia Lubennikova 2016-08-04 07:56:41 Re: Leaking memory in text_overlay function