Quick Links

Re: Core team statement on replication in PostgreSQL

From:	"Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Core team statement on replication in PostgreSQL
Date:	2008-06-01 18:31:13
Message-ID:	758d5e7f0806011131t70ff9bbdl57926d22dd04a024@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 4:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The Postgres core team met at PGCon to discuss a few issues, the largest
> of which is the need for simple, built-in replication for PostgreSQL.
[...]
> We believe that the most appropriate base technology for this is
1> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
> We hope that such a feature can be completed for 8.4. Ideally this
> would be coupled with the ability to execute read-only queries on the
> slave servers, but we see technical difficulties that might prevent that
> from being completed before 8.5 or even further out. (The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.)
>
> Again, this will not replace Slony, pgPool, Continuent, Londiste, or
> other systems for many users, as it will be not be highly scalable nor
> support long-distance replication nor replicating less than an entire
> installation. But it is time to include a simple, reliable basic
> replication feature in the core system.

Hello!

I thought I would share a few thoughts of my own about the issue.
I have a hands-on experience with Oracle and MySQL apart from
PostgreSQL so I hope it will be a bit interesting.

The former has a feature called "physical standby", which looks
quite like our WAL-shipping based replication. Simply archived
logs are replayed on the standby database. A primary database
and standby database are connected, and can stream the logs
directly. They either copy the log when its finished (as we do now)
or can do it in coninuous manner (as I hope we will be able to).

It is possible to have a synchronous replication (where "COMMIT"
on primary database succeeds when the data is safely stored on
the standby database). I think such a feature would be a great
advantage for PostgreSQL (where you cannot afford to loose
any transactions).

Their standby database is not accessible. It can be opened read-only,
but during that time replication stops. So PostgreSQL having
read-only and still replicating standby database would be great.

The other method is "logical standby" which works by dissecting
WAL-logs and recreating DDLs/DMLs from it. Never seen anyone
use it. ;-)

Then we have a mysql replication -- done by replaying actual DDLs/DMLs
on the slaves. This approach has issues, most notably when slaves are
highly loaded and lag behind the master -- so you end up with infrastructure
to monitor lags and turn off slaves which lag too much. Also it is painful
to setup -- you have to stop, copy, configure and run.

* Back to PostgreSQL world

As for PostgreSQL solutions we have a slony-I, which is great as long as
you don't have too many people managing the database and/or your
schema doesn't change too frequently. Perhaps it would be maintainable
more easily if there would be to get DDLs (as DDL triggers or similar).
Its main advantages for me is ability to prepare complex setups and
easily add new slaves). The pgpool solution is quite nice but then
again adding a new slave is not so easy. And being a filtering
layer between client and server it feels a bit "fragile" (I know it is not,
but then again it is harder to convince someone that "yes it will work
100% right all the time").

* How I would like PostgreSQL WAL-replication to evolve:

First of all it would be great if a slave/standby would contact the master
and maintain the state with it (tell it its xmin, request a log to stream,
go online-streaming). Especially I hope that it should be possible
to make a switchover (where the two databases exchange roles),
and in this the direct connection between the two should help.

In detail, I think it should go like this:
* A slave database starts up, checks that it works as a "replica"
(hopefully it would not be a postgresql.conf constant, but rather
some file maintained by the database).
* It would connect to the master database, tell where in the WAL
it is now, and request a log N.
* If log N is not available, request a log from external supplied
script (so that it could be fetched from log archive repository
somewhere, recovered from a backup tape, etc).
* Continue asking, until we get to the logs which are available
at master database.
* Continue replaying until we get within max_allowed_replication_lag
time, and open our slave for read-only queries.
* If we start lagging too much perhaps close the read-only access
to the database (perhaps configurable?).

I think that replication should be easy to set up. I think our
archive_command is quite easy, but many a person come
with a lot of misconceptions how it works (and it takes time
to explain them how it actually work, especially what is
archive_command for, and that pg_start_backup() doesn't
actually _do_ backup, but just tells PostgreSQL that
backup is being done).

Easy to setup and easy to switchover (change the direction
of replication) and failover.

Also, it would be interesting to be able to have a shared-storage
solution where a single database would modify data and multiple
read-only slaves would access the data. The WAL-shipping would
be used then to "push" modified pages into read-only slaves' memory.
And each checkpoint would allow read-only slaves to release these
pages from memory. Kind of replication without actually doing any
writes.

* high available libpq?

Other thing worth to consider is perhaps adding a HA-capability
to libpq. Currently I use a setup where I have machine with
database and pg_bouncer (for connection pooling at database
level) and on client machines I have a pgpool on localhost
(for transparent failover). I think some level of replication
awareness could be built-in into libpq. It would allow to make
simpler, but still HA, setups.

Perhaps ability to specify a standby in connect string, something like:

"host=foo1.bar;db=pgdb;failover=yes;stbyhost=foo2.bar"

...a failover="yes" would allow libpq to try connecting to stbyhost
if host is not available AND succeed if stbyhost announces itself
as a new master.

This would make switchover easier (clients would be told, that
we are closing, and that they should connect to a new master).

Also some way of supplying multiple hosts, where client should
pick a host at random, would be quite useful, say:

"host=foo1.bar,foo2.bar,foo3.bar;db=pgdb;loadbalance=random"

Why in libpq and not in [language/framework/application]? Well
it would help with not reinventing the wheel. And if we want
a painless switchover, client should be aware that there is
a standby ready. Also, being optional it wouldn't change anything
for those who don't want it.

Regards,
Dawid

PS: As for pgpool/pg_bouncer -- I wish there would be a way for
them to tell the database that they are proxying for someone
else (so that I could check in pg_stat_activity who is on the far
end of the database connection, not "localhost" ;)).
--
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.

In response to

Core team statement on replication in PostgreSQL at 2008-05-29 14:12:55 from Tom Lane

Browse pgsql-advocacy by date

	From	Date	Subject
Next Message	Selena Deckelmann	2008-06-01 19:22:51	BoF at OSCON 2008: Wednesday, July 23, 7:30pm
Previous Message	Tom Lane	2008-06-01 17:43:22	Re: Core team statement on replication in PostgreSQL

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2008-06-01 18:53:08	Re: proposal: table functions and plpgsql
Previous Message	Tom Lane	2008-06-01 17:43:22	Re: Core team statement on replication in PostgreSQL