Re: Merge a sharded master into a single read-only slave

From: Keith Fiske <keith(at)omniti(dot)com>
To: Sébastien Lorion <sl(at)thestrangefactory(dot)com>
Cc: Kevin Goess <kgoess(at)bepress(dot)com>, PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Merge a sharded master into a single read-only slave
Date: 2014-06-04 17:50:31
Message-ID: CAG1_KcBDtNj1yh8sMTzpffm1k-_qjrXHXVOFaQeQsY6bW5tusg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 2, 2014 at 2:47 PM, Sébastien Lorion <sl(at)thestrangefactory(dot)com>
wrote:

> On Mon, Jun 2, 2014 at 12:52 PM, Kevin Goess <kgoess(at)bepress(dot)com> wrote:
>
>> > So my conclusion is that for now, the best way to scale read-only
>> queries for a sharded master is to
>> > implement map-reduce at the application level.
>>
>> That's the conclusion I would expect. It's the price you pay for
>> sharding, it's part of the deal.
>>
>> But it's also the benefit you get from sharding. Once your read traffic
>> grows to the point that it's too much for a single host, you're going to
>> have to re-shard it all again *anyway*. The whole point of sharding is
>> that it allows you to grow outside the capacities of a single host.
>>
>
> ​I am not sure I am following you completely. I can replicate the
> read-only slaves almost as much as I want (with chained replication), so
> why would I be limited to a single host ? You would have a point concerning
> database size, but in my case, the main reason I need to shard is because
> of the amount of writes.
>
>
Not sure if this will work for you, but sharing a similar scenario in case
it may work for you.

An extension I wrote provides similar logical replication as you've
probably seen in other tools.
https://github.com/omniti-labs/mimeo
One difference is it has several methods for this replication, one being
incremental based on either time or serial ID. Since incremental
replication requires just read-only access on the source databases, it
causes no extra write overhead as most logical replication solutions do
(triggers writing to queue tables).

A client of ours had a table sharded by UUID to 512 clusters but needed
that data pulled to a single cluster for reporting purposes. The tables
also had a timestamp column that was set on each insert/update, so the
incremental replication method was able to be used here to pull data from
all clusters to a single cluster. The single reporting cluster then just
had an inheritance table set up with an empty parent table pointing to all
the child tables that pulled data into them.

Yes, it was a lot of setup since each of the 512 tables has to be set up
individually. But once it was set up it worked surprisingly well. And it's
honestly a use case I had never foreseen for the extension.

Not sure if this would work in your case, but maybe it can at least give
you an idea of what can be done.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G Johnston 2014-06-04 19:29:18 Re: PostgreSQL 9.3 XML parser seems not to recognize the DOCTYPE element in XML files
Previous Message Peter Eisentraut 2014-06-04 17:44:24 Re: Re: PostgreSQL 9.3 XML parser seems not to recognize the DOCTYPE element in XML files