From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Ants Aasma <ants(at)cybertec(dot)at> |
Cc: | sthomas(at)optionshouse(dot)com, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com> |
Subject: | Re: Inconsistent DB data in Streaming Replication |
Date: | 2013-04-10 17:42:23 |
Message-ID: | 6231.1365615743@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Ants Aasma <ants(at)cybertec(dot)at> writes:
> We already rely on WAL-before-data to ensure correct recovery. What is
> proposed here is to slightly redefine it to require WAL to be
> replicated before it is considered to be flushed. This ensures that no
> data page on disk differs from the WAL that the slave has. The
> machinery to do this is already mostly there, we already wait for WAL
> flushes and we know the write location on the slave. The second
> requirement is that we never start up as master and we don't trust any
> local WAL. This is actually how pacemaker clusters work, you would
> only need to amend the RA to wipe the WAL and configure postgresql
> with restart_after_crash = false.
> It would be very helpful in restoring HA capability after failover if
> we wouldn't have to read through the whole database after a VM goes
> down and is migrated with the shared disk onto a new host.
The problem with this is it's making an idealistic assumption that a
crashed master didn't do anything wrong or lose/corrupt any data during
its crash. As soon as you realize that's an unsafe assumption, the
whole thing becomes worthless to you.
If the idea had zero implementation cost, I would say "sure, let people
play with it until they find out (probably the hard way) that it's a bad
idea". But it's going to introduce, at the very least, additional
complexity into a portion of the system that is critical and plenty
complicated enough already. That being the case, I don't want it there
at all, not even as an option.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2013-04-10 18:19:56 | Re: Enabling Checksums |
Previous Message | Ants Aasma | 2013-04-10 17:14:15 | Re: Inconsistent DB data in Streaming Replication |