Re: Sync Rep: Second thoughts

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Emmanuel Cecchet <manu(at)frogthinker(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, aidan(at)highrise(dot)ca, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, postgres-r-general(at)pgfoundry(dot)org
Subject: Re: Sync Rep: Second thoughts
Date: 2008-12-23 16:52:51
Message-ID: 49511763.7010103@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Emmanuel,

Emmanuel Cecchet wrote:
> What Bettina calls the Lock Phase in
> http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf is actually a
> certification.

Aha. Hm.. that has gone since Postgres-R (SI) and doesn't exist anymore
in my current version either (so far called Postgres-R (8)). Most of
what the certifier does (ordering of write sets) is handled by the GCS,
everything else (i.e. what Tashkent refers to as write-write conflicts)
happens within the database system itself using MVCC.

> You can find more references to certification protocols in
> http://gorda.di.uminho.pt/download/reports/gapi.pdf

Thank you for that pointer. Seems like the term "certify" irritated me,
because that's much more tied to public key encryption and such in my mind.

> I would also recommend the work of Sameh on Tashkent and Taskent+ that
> was based on Postgres:

Thanks again. I've read the first one, which confirmed that I'm on the
right track with what I'm doing with Postgres-R (8). I'm preparing to
relive the single replicas from (most of) the WAL logging and instead
apply separate change- or write-set logging. That seems to be the main
achievement of Tashkent. Its savings are pretty obvious, IMO, because it
heavily reduces the overall amount of I/O operations.

>> What do you mean by "reliability issues"?
>>
> These approaches usually require an atomic broadcast primitive that is
> usually fragile (limited scalability, hard to tune failure timeouts, ).

I didn't have much reliability issues with ensemble, appia or spread, so
far. Although, I admit I didn't ever run any of these in production.
Performance is certainly an issue, yes.

> Most prototype implementations have the load balancer and/or the
> certifier as a SPOF (single point of failure). Building reliability for
> these components will come with a significant performance penalty.

That's a point, yeah. There's alway a compromise between performance and
reliability. And more often than not, the third aspect to complicate the
matter even further is: cost.

>> What limitations are you speaking of here?
>
> Oftentimes DDL support is very limited.

Agreed. My Postgres-R versions doesn't support any of those, yet. BTW,
that's one of the cases where (fully) synchronous replication is more
efficient, because DDL commands very often conflict with other
transactions, it's better to use pessimistic locking.

> Non-transactional objects like
> sequences are not captured.

Postgres-R (8) partly covers sequences already. It uses atomic
broadcasts (independent from change set collection or multi-casting). An
optional per node caching of sequence numbers helps reducing network
latency for sequence increments.

> Session or environment variables are not necessarily propagated. Support
> of temp tables varies between databases which makes it hard to support
> them properly in a generic way.
> Well I guess everyone has a story on some limitations it has found with
> some database replication technology especially when a user expects a
> cluster to behave like a single database instance.

Certainly, yes.

> Happy holidays,

Thanks, same to you!

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-12-23 17:04:11 Re: incoherent view of serializable transactions
Previous Message Simon Riggs 2008-12-23 16:48:42 Re: Synchronous replication, reading WAL for sending