Re: Data split -- Creating a copy of database without outage

From: Jan Nielsen <jan(dot)sture(dot)nielsen(at)gmail(dot)com>
To: Igor Shmain <igor(dot)shmain(at)gmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Data split -- Creating a copy of database without outage
Date: 2012-06-03 03:00:12
Message-ID: CANxH4hGpkp=tOX9tAjZcpJKHSubKoTEDGxcHUBz4C_c4ZNC0Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Igor,

On Tue, May 29, 2012 at 2:13 PM, Igor Shmain <igor(dot)shmain(at)gmail(dot)com> wrote:

> I need to design a solution for a database which will grow and will
> require a horizontal split at some moment.
>

Ok

> **
>
> Here is how I am planning to do it: Every record in every table has a
> shard number. After the database becomes too large (gets too many
> requests), the tables need to be horizontally split. It means that for
> every table all records with some specific shard numbers need to be moved
> to a new database.
>

Since PG does not support the notion of a shard at the database layer, you
will have to create this yourself. The key to scaling shards is ensuring
there there are no cross-shard joins (or any other shared data); this is an
application-design question so unless you "own" the application-design,
this will be difficult to achieve. If you do own the design, the
application's entity-tree will have to be carefully constructed to avoid
any sharing of resources across shards. RDBMS programmers are taught just
the opposite so this will require diligence from your team with a very
clear, likely simple, application-entity design amenable to shard-ing.

> ****
>
> My plan is to (1) create a copy of the database on a new server
> (backup/restore?), (2) synchronize the databases (how?), and then (3)
> forward all requests for the moved shards to the new database. (I will get
> rid of the unneeded shards later). An important requirement: there should
> be no outage for the users.
>

As I mentioned above, it is unlikely that you will be able to shard an
RDBMS at this level; you need to look at the application entity-design.

****
>
> What are the ways to achieve it?
>

If you mean to dictate a shard solution, very few will be able to help. On
the other hand, if you mean to ask for help in finding a "low-cost
scale-able PG DB solution for a typical RDBMS application with the
potential for rapid growth", then there are a number of options which have
been employed with success. For predominately read applications, use a
single master database replication strategy:

http://www.postgresql.org/docs/current/static/different-replication-solutions.html

For predominately write applications, you will likely need a multi-master
solution (unless you can dictate the application entity-design) which is
more complex territory from a scale-ability and application design
standpoint but you might look at Postgres-XS and Londiste for ideas.

Cheers,

Jan

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message 29t4dgru57 2012-06-04 15:26:03 How to skip StackBuilder at end of Installer?
Previous Message Igor Shmain 2012-06-03 00:57:25 Re: Data split -- Creating a copy of database without outage