Re: Data split -- Creating a copy of database without outage

From: "Igor Shmain" <igor(dot)shmain(at)gmail(dot)com>
To: "'Kevin Grittner'" <Kevin(dot)Grittner(at)wicourts(dot)gov>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Data split -- Creating a copy of database without outage
Date: 2012-06-03 00:57:25
Message-ID: 009f01cd4123$d89d9840$89d8c8c0$@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thank you for the "food for thoughts", Kevin :-) Would it be possible for
you to mention what hardware (cpu, ram, disks, etc.) and software your
system uses to support this db size and number of transactions?

Regarding the original question: It was not a question of potency of
postgres. The architecture I am working on is intended to be used by a web
startup. If the product is successfully, many users can start using the
service in a very short time. If that happens, there will be not time to
re-architect the database and the applications; the database will need to be
scaled almost overnight. Total number of requests per day is not a major
criterion for this system. The response time for multiple hits in a short
period of time is more important. The requirement is to serve thousands of
requests per second.

Buying a "super" computer, hoping that one day it will run at full throttle
is not for startups. Getting such a powerful computer quickly and moving the
database there is unrealistic. It makes more sense to design the system in a
way so it can be easily and quickly distributed across many relatively
inexpensive servers. That is why the sharding is needed.

If you say something like "this is just purely theoretical", "what are the
chances to get all those users", "things like that does not happen
overnight", I would totally agree. But look at it from another angle: If
only a few people use the application, the company will stay with a small
server and will not lose much. But if the service is successful, the company
will deploy a whole bunch of servers in a few hours and will be able to
serve all the users quickly :-)

It is a trade-off. More work now in exchange for having a scalable system
tomorrow (yes, yes, it is also called premature optimization :-) And you
know what, it does not look like too much extra work now :-)

If you see real or potential problems in this logic, or heard about similar
implementations, please mention that. I would appreciate it very much.

Best wishes,
-igor

-----Original Message-----
From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
Sent: June-02-12 11:12 AM
To: igor(dot)shmain(at)gmail(dot)com; pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without
outage

"Igor Shmain" wrote:

> I need to design a solution for a database which will grow and will
> require horizontal split at some moment.

Just one more bit of "food for thought" -- we have a database with 3TB
processing approximately 50 million database transactions per day (some with
a great many statements or affecting many rows) running quite comfortably on
a single machine (actually sharing that machine with a 2TB database on a
separate drive array), without partitioning.

We have done a lot of tuning.

I'm not sure what your basis is for the assumption that you will need to
split the database across machines; you might be right, but you might be
engaging in "premature optimization".

-Kevin

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jan Nielsen 2012-06-03 03:00:12 Re: Data split -- Creating a copy of database without outage
Previous Message Kevin Grittner 2012-06-02 15:11:34 Re: Data split -- Creating a copy of database without outage