Skip site navigation (1) Skip section navigation (2)

Re: Data split -- Creating a copy of database without outage

From: "Igor Shmain" <igor(dot)shmain(at)gmail(dot)com>
To: "'Jan Nielsen'" <jan(dot)sture(dot)nielsen(at)gmail(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Data split -- Creating a copy of database without outage
Date: 2012-06-05 17:15:30
Message-ID: 017001cd433e$d00c41f0$7024c5d0$@gmail.com (view raw or flat)
Thread:
Lists: pgsql-admin
Hi Jan,

 

Thank you so much for your input and sorry about the delay.

 

I understand most of the limitations, compromises, and sacrifices that come
with distributed databases (sharding). But definitely not all of them. After
working with databases for about 25 years, I have some basic knowledge about
how they work (mssql, oracle, sybase). But I am new to postgres and new to
distributed db approach, so any feedback in those areas is extremely welcome
:-)

 

If anybody is aware of any successful implementations of distributed
approach with postgres and any information around this area, please let me
know. If I cannot find a suitable prototype and has to "invent the wheel", I
hope to be able to contribute back to the community with my story one day.

 

 

Thank you once again,

-igor

 

 

From: Jan Nielsen [mailto:jan(dot)sture(dot)nielsen(at)gmail(dot)com] 
Sent: June-02-12 11:00 PM
To: Igor Shmain
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without
outage

 

Hi Igor,

On Tue, May 29, 2012 at 2:13 PM, Igor Shmain <igor(dot)shmain(at)gmail(dot)com> wrote:

I need to design a solution for a database which will grow and will require
a horizontal split at some moment.

 

Ok

 

Here is how I am planning to do it: Every record in every table has a shard
number. After the database becomes too large (gets too many requests), the
tables need to be horizontally split. It means that for every table all
records with some specific shard numbers need to be moved to a new database.

 

Since PG does not support the notion of a shard at the database layer, you
will have to create this yourself. The key to scaling shards is ensuring
there there are no cross-shard joins (or any other shared data); this is an
application-design question so unless you "own" the application-design, this
will be difficult to achieve. If you do own the design, the application's
entity-tree will have to be carefully constructed to avoid any sharing of
resources across shards. RDBMS programmers are taught just the opposite so
this will require diligence from your team with a very clear, likely simple,
application-entity design amenable to shard-ing.

 

My plan is to (1) create a copy of the database on a new server
(backup/restore?), (2) synchronize the databases (how?), and then (3)
forward all requests for the moved shards to the new database. (I will get
rid of the unneeded shards later). An important requirement: there should be
no outage for the users.

 

As I mentioned above, it is unlikely that you will be able to shard an RDBMS
at this level; you need to look at the application entity-design.

 

What are the ways to achieve it?

 

If you mean to dictate a shard solution, very few will be able to help. On
the other hand, if you mean to ask for help in finding a "low-cost
scale-able PG DB solution for a typical RDBMS application with the potential
for rapid growth", then there are a number of options which have been
employed with success. For predominately read applications, use a single
master database replication strategy:

 

 
http://www.postgresql.org/docs/current/static/different-replication-solution
s.html

 

For predominately write applications, you will likely need a multi-master
solution (unless you can dictate the application entity-design) which is
more complex territory from a scale-ability and application design
standpoint but you might look at Postgres-XS and Londiste for ideas.

 

 

Cheers,

 

Jan

 

In response to

pgsql-admin by date

Next:From: Amador AlvarezDate: 2012-06-06 00:22:55
Subject: Can schemas be ordered regarding their creation time ?
Previous:From: 29t4dgru57Date: 2012-06-04 15:26:03
Subject: How to skip StackBuilder at end of Installer?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group