Re: One large v. many small

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Curtis Faith <curtis(at)galtcapital(dot)com>
Cc: 'Josh Berkus' <josh(at)agliodbs(dot)com>, 'Noah Silverman' <noah(at)allresearch(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: One large v. many small
Date: 2003-02-01 05:07:00
Message-ID: Pine.NEB.4.51.0302011359390.610@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Fri, 31 Jan 2003, Curtis Faith wrote:

> Depending on the way the records are accessed and the cache size,
> the exact opposite could be true. The index pages will most likely
> rarely be in memory when you have 3000 different tables. Meaning
> that each search will require at least three or four index page
> retrievals plus the tuple page.

Assuming you're using indexes at all. If you're tending to use table
scans, this doesn't matter.

From Noah's description it seemed he was--he said that a particular data
item couldn't be the primary key, presumably because he couldn't index
it reasonably. But this just my guess, not a fact.

> Combine a multi-part index (on both client and foo, which order
> would depend on the access required) that is clustered once a week
> or so using the admittedly non-optimal PostgreSQL CLUSTER command
> and I'll bet you can get equivalent or better performance...

I would say that, just after a CLUSTER, you're likely to see better
performance because this would have the effect, on a FFS or similar
filesystem where you've got plenty of free space, of physically
clustering data that would not have been clustered in the case of a lot
of small tables that see a lot of appending evenly over all of them over
the course of time.

So the tradeoff there is really, can you afford the time for the CLUSTER?
(In a system where you have a lot of maintenance time, probably. Though if
it's a huge table, this might need an entire weekend. In a system that needs
to be up 24/7, probably not, unless you have lots of spare I/O capacity.)
Just out of curiousity, how does CLUSTER deal with updates to the table while
the CLUSTER command is running?

> I don't think there is any substitute for just trying it out. It
> shouldn't be that hard to create a bunch of SQL statements that
> concatenate the tables into one large one.

I entirely agree! There are too many unknowns here to do more than
speculate on this list.

But thanks for enlightening me on situations where one big table perform
better.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthew V. 2003-02-01 05:33:35 Re: paradigm sanity check needed [long]
Previous Message Andrew McMillan 2003-02-01 04:43:48 Re: [NOVICE] For each record in SELECT

Browse pgsql-performance by date

  From Date Subject
Next Message Sean Chittenden 2003-02-01 05:09:35 Re: [PERFORM] not using index for select min(...)
Previous Message Tom Lane 2003-02-01 04:35:32 Re: [PERFORM] not using index for select min(...)