Re: Thousands of tables versus on table?

From: david(at)lang(dot)hm
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Thomas Andrews <tandrews(at)soliantconsulting(dot)com>, Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Thousands of tables versus on table?
Date: 2007-06-07 05:22:26
Message-ID: Pine.LNX.4.64.0706062218560.6675@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 6 Jun 2007, Scott Marlowe wrote:

>> > pretty doggone cheap relative to the table searches, which means it
>> > almost certainly will lose badly if you carry the subdivision out to
>> > the extent that the individual tables become small. (This last could
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> what is considered 'small'? a few thousand records, a few million records?
>
> I would say small is when the individual tables are in the 10 to 20 Megabyte
> range. How many records that is depends on record width, of course.
> Basically, once the tables get small enough that you don't really need
> indexes much, since you tend to grab 25% or more of each one that you're
> going to hit in a query.

thanks, that makes a lot of sense

>> what multiplication factor would there need to be on the partitioning to
>> make it worth while? 100 tables, 1000 tables, 10000 tables?
> Really depends on the size of the master table I think. If the master table
> is about 500 Megs in size, and you partition it down to about 1 meg per child
> table, you're probably ok. Walking through 500 entries for constraint
> exclusion seems pretty speedy from the tests I've run on a 12M row table that
> was about 250 Megs, split into 200 to 400 or so equisized child tables. The
> time to retrieve 85,000 rows that were all neighbors went from 2 to 6
> seconds, to about 0.2 seconds, and we got rid of indexes entirely since they
> weren't really needed anymore.

remember, I'm talking about a case wher eyou don't have to go through
contraint checking. you know to start with what customerID you are dealing
with so you just check the tables for that customer

>> the company that I'm at started out with a seperate database per customer
>> (not useing postgres), there are basicly zero cross-customer queries, with
>> a large volume of updates and lookups.
>>
>> overall things have now grown to millions of updates/day (some multiple of
>> this in lookups), and ~2000 customers, with tens of millions of rows
>> between them.
>>
>> having each one as a seperate database has really helped us over the years
>> as it's made it easy to scale (run 500 databases on each server instead of
>> 1000, performance just doubled)
> I think that for what you're doing, partitioning at the database level is
> probably a pretty good compromise solution. Like you say, it's easy to put
> busy databases on a new server to balance out the load. Hardware is cheap.
>
>> various people (not database experts) are pushing to install Oracle
>> cluster so that they can move all of these to one table with a customerID
>> column.
> Have these people identified a particular problem they're trying to solve, or
> is this a religious issue for them? From your description it sounds like a
> matter of dogma, not problem solving.

in part it is, in part it's becouse the commercial database companies have
told management that doing database replication is impossible with so many
databases (we first heard this back when we had 300 or so databases),
we've gone the expensive EMC disk-layer replication route, but they think
that mergeing everything will simplify things somehow so the database can
do it's job better.

I see it as just a limitation on the replication solution offered by the
bigname vendors.

>> the database folks won't comment much on this either way, but they don't
>> seem enthusiastic to combine all the data togeather.
> I think they can see the fecal matter heading towards the rotational cooling
> device on this one. I can't imagine this being a win from the perspective of
> saving the company money.

neither do I.

David Lang

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kristo Kaiv 2007-06-07 07:09:25 Re: VERY slow queries at random
Previous Message mark 2007-06-07 03:33:52 Re: LIKE search and performance