Re: PostgreSQL + Replicator developer meeting 10/28

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PostgreSQL + Replicator developer meeting 10/28
Date: 2008-10-29 07:54:12
Message-ID: 1225266852.13402.83.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:
> Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
>
> > > The two obvious problems with the existing MCP architecture is:
> > >
> > > 1. Single point of failure
> >
> > For async replication there is always SPoF, at least the master until
> > first slave has aquired log is a SPoF, or do you plan that both Master
> > and "MCP|Slave" to keep the log and be able to step in for each other if
> > the other fails?
>
> Yeah, with the new architecture there is still going to be a bit of a
> SPoF in the master->MCP but it's a lot smaller than the current setup,
> in which if you lose the MCP you basically lose everything.
>
> > > 2. Portability
> >
> > Portability to where ? Other DBMS's ? Other PG versions ?
>
> Other operating systems mainly. The trouble is we never got around to
> porting the MCP to any OS beyond Linux; I think it should work on
> Solaris and BSDs, but surely not Windows. We want to just get rid of
> what I consider a (crappy) reimplementation of postmaster; instead we
> should just let postmaster do the job.
>
> Additionally we would get rid of the ugly way we "import" backend code
> into the MCP server.
>
>
> > for me there was also two more problems:
> >
> > 3. separate "replication log", which at least seems to be able to get
> > out of sync with main DB.
> >
> > Why don't you just use a DB table, WAL-logged and all
>
> The whole replication log thing is a topic of dissent in the team ;-)

I see. To work reliably, the replication log should work very similar to
WAL, so why just not use a table + WAL, or if you want extra performance
from storing it on a separate disk, then work on having multiple WAL's
in backend ;)

> > 4. Also, again from reading Replicator FAQ, it seems that there is a
> > window of corruption/data loss when rotating the Replicators transaction
> > log. I think that doing it with copy/truncate either needs locking the
> > logfile (== bad performance, during copy/truncate) or is just a
> > data-eating failure waiting to happen.
>
> Hmm, what Replicator FAQ? We used to have this copy/truncate problem,
> and we rearchitected the log to avoid this (we use a rotating setup
> now)

it was in subsection "mcp_server mysteriously dies"
http://www.commandprompt.com/products/mammothreplicator/tips ,

> > > Master->MCP|Slave ->Slave1
> > > ->Slave2
> > > ->Slave3
> > >
> > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it
> > > to disk (optionally restores it)
> >
> > Will this first send be sync or async ? Or have you planned to have it
> > be configurable among several robustness vs. performance levels, similar
> > to the planned integrated WAL-shipping.
>
> It is async, and we haven't talked about sync.
>
> > if async, will it also use MVCC for keeping log on Master (l.ike Slony
> > and pgQ do), just to be at least as reliable as postgreSQL core itself
> > and not require a full resync at server crash.
>
> You mean WAL? We don't currently.

So hopw do you cope with possible loss of sync on master crash ?

> > > Alvaro or Alexey can speak more technically about implementation than I
> > > can.
> >
> > Alvaro - I guess you already have discussed most of it, but basically
> > you need to solve all the same problems that WAL-shipping based Hot
> > Standby is solving and Slony/pgQ/Londiste have solved.
>
> If you mean that we're duplicating the effort that's already going
> elsewhere, my opinion is yes, we are.

duplicating the effort is not always a bad thing. I was mostly
suggesting to watch discussions and dig around in materials and/or
asking people who have been working on these same issues.

And of course to _think_ deeply about design before writing lots of
duplicate code which ends up being an often inferior implementation of
something that already exists, ( see:
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx )
;-)

> > Hopefully you get it more robust than Slony when making changes under
> > high load :)
>
> Hmm, I don't know about lack of robustness in Slony, so I don't know.

Slony is brittle once you start using it under high load and tends to
display all kinds of frustrating qualities

1) it has not enough controls put in for conf changes to guarantee
either success or clean rollback, do if something goes wrong (like some
conf change has not propagated to all nodes, in right order, you end up
with no working replication.

2) you usually can't test for 1) on your test setup, as it happens only
under really high loads, which most test setups don't provide.

there are/were other warts (like forcing an index scan covering the
whole table, or being unable to continue replication after some slonik
downtime because postgreSQL would give query too complex errors on
generated 700kb lobg query), some of which are fixed in 1.x, some are
maybe fixed in 2.0.

I was a heavy user (at Skype) at some point and have helped in fixing
some. But in the end we could not figure out how to make it robust and
extracted the good stuff for pgQ and wrote our own replication based on
that, which we could make perform and be robust when changing conf.

> > Will there be an helper application for setting up and configuring
> > changes in replication. or will it all be done using added SQL
> > commands ?
>
> Well, the interface I work on is all SQL commands :-)
>
> > How will DDL be handled ( i understood that you don't yet have DDL
> > replication) ?
>
> We don't have it yet. However, since we can just add any code in any
> place we like, and that we have a protocol to transmit changes, it is
> relatively easy to add calls to collect the needed information and
> replay it on the slave.

Do you transmit changes to and apply changes on slave as binary or as
SQL statements ?

Do slaves also have to be modified just to receive changes ?

I think the hairy part will be getting the order of commands _exactly_
right (like Hot Standby again), but if you are similar to
Slony/pgQ/Londiste in that you just transfer logical changes, not
physical page-level changes, then the DDL locking on master may be
enough to guarantee the right order. That is assuming that you already
can guarantee right (commit "time") order on slaves. this is not the
same as transaction start order, which may give wrong/inconsistent data
states.

> > Will Slave tables be kind-of-read-only like Slony slaves ? Or even
> > _really_ read only like Simon's Hot Standby ?
>
> Heh -- they are read only, and they turn into read-write when the
> slave
> promotes. I'm not sure what kind does that make it :-)

This seems similar to Hot Standby. Slony enforces write-only using
triggers and it can be circumvented by telling these triggers that you
are a sloni replication process yourself.

--
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-10-29 08:07:01 some problem with casting unknown to smallint
Previous Message Joshua D. Drake 2008-10-29 06:13:20 Re: Decreasing WAL size effects