Re: Thousands of tables versus on table?

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Thomas Andrews" <tandrews(at)soliantconsulting(dot)com>
Cc: "Mark Lewis" <mark(dot)lewis(at)mir3(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Thousands of tables versus on table?
Date: 2007-06-04 20:18:43
Message-ID: 87y7izlbh8.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Thomas Andrews" <tandrews(at)soliantconsulting(dot)com> writes:

> Clustering sounds like it might be a really good solution. How long does a
> cluster command usually take on a table with 50,000,000 records? Is it
> something that can be run daily/weekly?

ouch, ok, with 50M records cluster isn't going to be quick either, especially
if you have a lot of indexes.

With those kinds of numbers and with the kind of workload you're describing
where you have different areas that are really complete separate you might
consider partitioning the table. That's essentially what you're proposing
anyways.

Honestly table partitioning in Postgres is pretty young and primitive and if
you have the flexibility in your application to refer to different tables
without embedding them throughout your application then you might consider
that. But there are also advantages to being able to select from all the
tables together using the partitioned table.

> I'd rather not post the schema because it's not mine - I'm a consultant. I
> can tell you our vacuum every night is taking 2 hours and that disk IO is
> the real killer - the CPU rarely gets higher than 20% or so.

Do you ever update or delete these records? If you never update or delete
records then the vacuum is mostly a waste of effort anyways. (You still have
to vacuum occasionally to prevent xid wraparound but that's much much less
often).

If you do delete records in large batches or have lots of updates then
vacuuming daily with default fsm settings probably isn't enough.

How many indexes do you have?

And if they don't all have client_id in their prefix then I wonder about the
plans you're getting. It's unfortunate you can't post your schema and query
plans. It's possible you have some plans that are processing many more records
than they need to to do their work because they're using indexes or
combinations of indexes that aren't ideal.
specific enough

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-06-04 20:20:55 Re: Thousands of tables versus on table?
Previous Message Thomas Andrews 2007-06-04 20:14:14 Re: Thousands of tables versus on table?