Re: One large v. many small

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Noah Silverman <noah(at)allresearch(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: One large v. many small
Date: 2003-01-30 17:56:56
Message-ID: 200301300956.56041.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Noah,

> As we continue our evaluation of Postgres, another interesting topic
> has come up that I want to run by the group.
>
> In our current model, we have about 3,000 small tables that we use
> track data for our clients. Each table is an identical structure, and
> holds the data for one client.

I'd list what's wrong with this structure, but frankly it would take me long
enough that I'd need a consulting fee. Suffice it to say that the above is
a very, very bad (or at least antiquated) design idea and you need to
transition out of it as soon as possible.

> Another idea that we are considering is one big table instead of 3,000
> smaller ones. We could simply add a numeric field to indicate which
> client a particular record was for.

Yes. Absolutely. Although I'd suggest an Integer field.

> Each table has between 500 and 50,000 records, so the big table could
> have up to 10 million rows if we combined everything.

Sure.

> A query on our current system is (for client #4)
>
> Select (*) from client_4 where foo=2;
>
> A query from the new, proposed system would be
>
> Select (*) from big_results where client=4 and foo=2.
>
> The big questions is, WHICH WILL BE FASTER with Postgres. Is there any
> performance improvement or cost to switching to this new structure.

Oh, no question query 1 will be faster ... FOR THAT QUERY. You are asking the
wrong question.

However, explain to me how, under the current system, you can find the client
who ordered $3000 worth of widgets on January 12th if you don't already know
who it is? I'm not sure a 3000-table UNION query is even *possible*.

Or how about giving me the average number of customer transactions in a month,
across all clients?

<rant>

You've enslaved your application design to performance considerations ... an
approach which was valid in 1990, because processing power was so limited
then. But now that dual-processor servers with RAID can be had for less than
$3000, there's simply no excuse for violating the principles of good
relational database design just to speed up a query. Buying more RAM is
much cheaper than having an engineer spend 3 weeks fixing data integrity
problems.

The proper way to go about application design is to build your application on
paper or in a modelling program according to the best principles of software
design available, and *then* to discuss performance issues -- addressing them
*first* by buying hardware, and only compromising your applcation design when
no other alternative is available.

</rant>

I strongly suggest that you purchase Pascal's "Practical Issues in Database
Design" and give it a read.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron St.Pierre 2003-01-30 17:59:17 Re: Documentation needs significant improvement
Previous Message will trillich 2003-01-30 17:36:40 Re: Documentation needs significant improvement

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2003-01-30 18:02:40 Re: One large v. many small
Previous Message Noah Silverman 2003-01-30 17:34:36 One large v. many small