Skip site navigation (1) Skip section navigation (2)

Re: One large v. many small

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Noah Silverman <noah(at)allresearch(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: One large v. many small
Date: 2003-01-30 17:56:56
Message-ID: 200301300956.56041.josh@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
Noah,

> As we continue our evaluation of Postgres, another interesting topic
> has come up that I want to run by the group.
>
> In our current model, we have about 3,000 small tables that we use
> track data for our clients.  Each table is an identical structure, and
> holds the data for one client.

I'd list what's wrong with this structure, but frankly it would take me long 
enough that I'd need a consulting fee.   Suffice it to say that the above is 
a very, very bad (or at least antiquated) design idea and you need to 
transition out of it as soon as possible.

> Another idea that we are considering is one big table instead of 3,000
> smaller ones.  We could simply add a numeric field to indicate which
> client a particular record was for.

Yes.   Absolutely.  Although I'd suggest an Integer field.

> Each table has between 500 and 50,000 records, so the big table could
> have up to 10 million rows if we combined everything.

Sure.

> A query on our current system is (for client #4)
>
> Select (*) from client_4 where foo=2;
>
> A query from the new, proposed system would be
>
> Select (*) from big_results where client=4 and foo=2.
>
> The big questions is, WHICH WILL BE FASTER with Postgres.  Is there any
> performance improvement or cost to switching to this new structure.

Oh, no question query 1 will be faster ... FOR THAT QUERY.  You are asking the 
wrong question.

However, explain to me how, under the current system, you can find the client 
who ordered $3000 worth of widgets on January 12th if you don't already know 
who it is?   I'm not sure a 3000-table UNION query is even *possible*.

Or how about giving me the average number of customer transactions in a month, 
across all clients?

<rant>

You've enslaved your application design to performance considerations ... an 
approach which was valid in 1990, because processing power was so limited 
then.  But now that dual-processor servers with RAID can be had for less than 
$3000, there's simply no excuse for violating the principles of good 
relational database design just to speed up a query.   Buying more RAM is 
much cheaper than having an engineer spend 3 weeks fixing data integrity 
problems.

The proper way to go about application design is to build your application on 
paper or in a modelling program according to the best principles of software 
design available, and *then* to discuss performance issues -- addressing them 
*first* by buying hardware, and only compromising your applcation design when 
no other alternative is available.

</rant>

I strongly suggest that you purchase Pascal's "Practical Issues in Database 
Design" and give it a read.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

pgsql-performance by date

Next:From: Andrew SullivanDate: 2003-01-30 18:02:40
Subject: Re: One large v. many small
Previous:From: Noah SilvermanDate: 2003-01-30 17:34:36
Subject: One large v. many small

pgsql-general by date

Next:From: Ron St.PierreDate: 2003-01-30 17:59:17
Subject: Re: Documentation needs significant improvement
Previous:From: will trillichDate: 2003-01-30 17:36:40
Subject: Re: Documentation needs significant improvement

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group