Re: Architecting a database

From: Bryan Hinton <bryan(at)bryanhinton(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, tony(at)exquisiteimages(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Architecting a database
Date: 2010-06-26 01:35:06
Message-ID: AANLkTikMHn44jYk4wsB732kA0MBZl_0Vg8huUnVuPMbl@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Interesting point you made about the read to write ratio of 1 to 15.
How frequently will you be adding new entities or in the case of storing the
customers in one database table, how frequently will you be adding new
objects of a certain entity type. How many entity types do you foresee
existing? i.e. "Customer?" Will Customer have subtypes or is a Customer the
single entity in the database?
How frequent and for how long are write operations and are they heavily
transaction based? Will you need to support complex reporting in the
future? What is the max number of customers? And how much data
(approximate) will a single customer record consume in bytes? At what rate
does it grow? (in bytes)
Will your system need to support any type of complex reporting in the future
(despite it being write intensive)?

I'd take a look at memcached, plproxy, pgpool, and some of the other cool
stuff in the postgresql community.
At a minimum, it might help you architect the system in such a manner that
you don't box yourself in.
Last, KV stores for heavy write intensive operations in distributed
environments are certainly interesting - a hybrid solution could work.

Sounds like a fun project!

Bryan

On Fri, Jun 25, 2010 at 7:02 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> Kevin Grittner wrote:
>
>> A schema is a logical separation within a database. Table
>> client1.account is a different table from client2.account. While a
>> user can be limited to tables within a single schema, a user with
>> rights to all the tables can join between them as needed. You could
>> put common reference data in a public schema which all users could
>> access in addition to their private schemas
>>
>
> My guess would be that this app will end up being best split by schema. I
> wonder whether it *also* needs to be split by database, too. 2000 clusters
> is clearly a nightmare, and putting all the client data into one big table
> has both performance and security issues; that leaves database and schema as
> possible splits. However, having 2000 databases in a cluster is probably
> too many; having 2000 schemas in a database might also be too many. There
> are downsides to expanding either of those to such a high quantity.
>
> In order to keep both those in the domain where they perform well and are
> managable, it may be that what's needed is, say, 50 databases with 40
> schemas each, rather than 2000 of either. Hard to say the ideal ratio.
> However, I think that at the application design level, it would be wise to
> consider each client as having a database+schema pair unique to them, and
> with the assumption some shared data may need to be replicated to all the
> databases in the cluster. Then it's possible to shift the trade-off around
> as needed once the app is built. Building that level of flexibility in
> shouldn't be too hard if it's in the design from day one, but it would be
> painful bit of refactoring to do later. Once there's a prototype, then some
> benchmark work running that app could be done to figure out the correct
> ratio between the two. It might even make sense to consider full
> scalability from day one and make the unique client connection info
> host:port:database:schema.
>
> P.S. Very refreshing to get asked about this before rather than after a
> giant app that doesn't perform well is deployed.
>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rajesh Kumar Mallah 2010-06-26 02:48:09 Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Previous Message Robert Haas 2010-06-26 01:15:36 Re: WAL+Os on a single disk