Re: Standalone synchronous master

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 21:18:29
Message-ID: 20140112211829.GM2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> Well, then that becomes a reason to want better/more configurability.

I agree with this- the challenge is figuring out what those options
should be and how we should document them.

> In the couple of sync rep sites I admin, I *would* want to wait.

That's certainly an interesting data point. One of the specific
use-cases that I'm thinking of is to auto-degrade on a graceful shutdown
of the slave for upgrades and/or maintenance. Perhaps we don't need
*auto* degrade in that case, but then an actual failure of the slave
will also bring down the master.

> > I don't follow this logic at all- why is there no safe way to resume?
> > You wait til the slave is caught up fully and then go back to sync mode.
> > If that turns out to be an extended problem then an alarm needs to be
> > raised, of course.
>
> So, if you have auto-resume, how do you handle the "flaky network" case?
> And how would an alarm be raised?

Ideally, every time there is a auto-degrade, messages are logs to log
files which are monitored and notices are sent to admins about it
happening, who, upon getting repeated such emails, would realize there's
a problem and work to fix it.

> On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >> I know others have dismissed this idea as too "talky", but from my
> >> perspective, the agreement with the client for each synchronous
> >> commit is being violated, so each and every synchronous commit
> >> should report failure to sync. Also, having a warning on every
> >> commit would make it easier to troubleshoot degraded mode for users
> >> who have ignored the other warnings we give them.
> >
> > I agree that every synchronous commit on a master which is configured
> > for synchronous replication which returns without persisting the work
> > of the transaction on both the (local) primary and a synchronous
> > replica should issue a WARNING. That said, the API for some
> > connectors (like JDBC) puts the burden on the application or its
> > framework to check for warnings each time and do something reasonable
> > if found; I fear that a Venn diagram of those shops which would use
> > this new feature and those shops that don't rigorously look for and
> > reasonably deal with warnings would have significant overlap.
>
> Oh, no question. However, having such a WARNING would help with
> interactive troubleshooting once a problem has been identified, and
> that's my main reason for wanting it.

I'm in the camp of this being too 'talky'.

> Imagine the case where you have auto-degrade and a flaky network. The
> user would experience problems as performance problems; that is, some
> commits take minutes on-again, off-again. They wouldn't necessarily
> even LOOK at the sync rep settings. So next step is to try walking
> through a sample transaction on the command line, and then the
> DBA/consultant gets WARNING messages, which gives an idea where the real
> problem lies.

Or they look in the logs which hopefully say that their slave keeps
getting disconnected...

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2014-01-12 21:19:55 Re: plpgsql.consistent_into
Previous Message Josh Berkus 2014-01-12 21:04:17 Re: Standalone synchronous master