Re: Partition table in 9.0.x?

From: AJ Weber <aweber(at)comcast(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Partition table in 9.0.x?
Date: 2013-01-06 15:27:01
Message-ID: 50E997C5.2000301@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

All fair questions...

Thank you for your detailed response!

On 1/4/2013 11:03 PM, Jeff Janes wrote:
> On Friday, January 4, 2013, AJ Weber wrote:
>
> Hi all,
>
> I have a table that has about 73mm rows in it and growing.
>
>
> How big is the table in MB? Its indexes?
Not sure on this. Will see if pgAdmin tells me.

>
> ...
>
> The server has 12GB RAM, 4 cores, but is shared with a big webapp
> running in Tomcat -- and I only have a RAID1 disk to work on.
> Woes me...
>
>
> By a RAID1 disk, do you mean two disks in a RAID1 configuration, or a
> single RAID1 composed of an unspecified number of disks?
>
> Often spending many thousands of dollars in DBA time can save you from
> having to buy many hundreds of dollars in hard drives. :) On the
> other hand, often you end up having to buy the extra disks anyway
> afterall.
>
I mean I have two disks in a RAID1 configuration. The server is
currently in a whitebox datacenter and I have zero control over the
hardware, so adding disks is unfortunately out of the question. I
completely understand the comment, and would love to have a larger SAN
available to me that I could configure...I just don't and have no way of
getting one anytime soon.

>
> Anyway, this table is going to continue to grow, and it's used
> frequently (Read and Write).
>
>
> Are all rows in the table read and written with equal vigor, or are
> there hot rows and cold rows that can be recognized based on the row's
> values?
No, I could probably figure out a way to setup an "archive" or "older"
section of the data that is updated much less frequently. Deletes are
rare. Inserts/Updates "yes". Select on existing rows -- very frequent.

> From what I read, this table is a candidate to be partitioned for
> performance and scalability. I have tested some scripts to build
> the "inherits" tables with their constraints and the
> trigger/function to perform the work.
>
> Am I doing the right thing by partitioning this?
>
>
> Probably not. Or at least, you haven't given us the information to
> know. Very broadly speaking, well-implemented partitioning makes bulk
> loading and removal operations take less IO, but makes normal
> operations take more IO, or if lucky leaves it unchanged. There are
> exceptions, but unless you can identify a very specific reason to
> think you might have one of those exceptions, then you probably don't.
I know you can't believe everything you read, but I thought I saw some
metrics about when a table's size exceeds some fraction of available
RAM, or when it approaches 100mm rows, it's a big candidate for
partitioning.

>
> Do you have a natural partitioning key? That is, is there a column
> (or expression) which occurs as a selective component in the where
> clause of almost all of your most io consuming SQL and DML? If so,
> you might benefit from partitioning on it. (But in that case, you
> might be able to get most of the benefits of partitioning, without the
> headaches of it, just by revamping your indexes to include that
> column/expression as their leading field).
>
> If you don't have a good candidate partitioning key, then partitioning
> will almost surely make things worse.
>
The table is a "detail table" to its master records. That is, it's like
an order-details table where it will have a 1-n rows joined to the
master ("order") table on the order-id. So I can partition it based on
the order number pretty easily (which is a bigint, btw).

> If so, and I can afford some downtime, is dumping the table via
> pg_dump and then loading it back in the best way to do this?
>
>
> To do efficient bulk loading into a partitioned table, you need to
> specifically target each partition, rather than targeting with a
> trigger. That pretty much rules out pg_dump, AFAIK, unless you are
> going to parse the dump file(s) and rewrite them.
>
>
> Should I run a cluster or vacuum full after all is done?
>
>
> Probably not. If a cluster after the partitioning would be
> beneficial, there would be a pretty good chance you could do a cluster
> *instead* of the partitioning and get the same benefit.
>
I did try clustering the table on the PK (which is actually 4 columns),
and it appeared to help a bit. I was hoping partitioning was going to
help me even more.

> If you do some massive deletes from the parent table as part of
> populating the children, then a vacuum full of the parent could be
> useful. But if you dump the parent table, truncate it, and reload it
> as partitioned tables, then vacuum full would probably not be useful.
>
> Really, you need to identify your most resource-intensive queries
> before you can make any reasonable decisions.
>
>
> Is there a major benefit if I can upgrade to 9.2.x in some way
> that I haven't realized?
>
>
> If you have specific queries that are misoptimized and so are
> generating more IO than they need to, then upgrading could help. On
> the other hand, it could also make things worse, if a currently well
> optimized query becomes worse.
>
Is there some new feature or optimization you're thinking about with
this comment? If so, could you please just send me a link and/or
feature name and I'll google it myself?

> But, instrumentation has improved in 9.2 from 9.0, so upgrading would
> make it easier to figure out just which queries are really bad and
> have the most opportunity for improvement. A little well informed
> optimization might obviate the need for either partitioning or more
> hard drives.
>
This is interesting too. I obviously would like the best available
options to tune the database and the application. Is this detailed in
the release notes somewhere, and what tools could I use to take
advantage of this? (Are there new/improved details included in the
EXPLAIN statement or something?)
>
>
> Finally, if anyone has any comments about my settings listed above
> that might help improve performance, I thank you in advance.
>
>
> Your default statistics target seemed low. Without knowing the nature
> of your most resource intensive queries or how much memory tomcat is
> using, it is hard to say more.
Tomcat uses 4G of RAM, plus we have nginx in front using a little and
some other, smaller services running on the server in addition to the
usual Linux gamut of processes.

>
> Cheers,
>
> Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2013-01-06 16:29:17 Re: [PERFORM] Slow query: bitmap scan troubles
Previous Message Tom Lane 2013-01-05 22:18:16 Re: [PERFORM] Slow query: bitmap scan troubles