Re: partitioning performance question

From: Robert Klemme <shortcutter(at)googlemail(dot)com>
To: Kevin Kempter <cs_dba(at)consistentstate(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: partitioning performance question
Date: 2012-06-10 11:16:40
Message-ID: CAM9pMnOLvSJ9AYCEN-ka-DmB-cX5K78=YyHRnqFdyQXAOBVdiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Jun 9, 2012 at 7:58 PM, Kevin Kempter
<cs_dba(at)consistentstate(dot)com> wrote:
> Hi All;
>
> We have a client that has a table where large blobs (bytea) are stored. the
> table has a key column that is numbers (like 112362) but unfortunately it's
> a varchar column so the blobs are accessed via queries like:
>
> select * from bigtable where keycol = '217765'
>
> The primary reason we want to partition the table is for maintenance, the
> table is well over 1.2 Terabytes in size and they have never successfully
> vacuumed it. However I don't want to make performance even worse. The table
> does have a serial key, I'm thinking the best options will be to partition
> by range on the serial key, or maybe via the keycol character column via
> using an in statement on the check constraints, thus allowing the planner to
> actually leverage the above sql.  I suspect doing a typecast to integer in
> the check constraints will prove to be a bad idea if the keycol column
> remains a varchar.
>
> Thoughts?
>
> Here's the table:
>
>
>                                        Table "problemchild"
>  Column   |           Type           |                             Modifiers
> -----------+--------------------------+--------------------------------------------------------------------
>  keycol                | character varying        |
>  blob_data           | bytea                    |
>  removed_date    | timestamp with time zone |
>  alt_key | bigint                   | not null default
> nextval('problemchild_alt_key_seq'::regclass)
> Indexes:
>    "pc_pkey" PRIMARY KEY, btree (alt_key)
>    "key2" btree (keycol)

I find it odd that you have a column "keycol" which is not the PK and
your PK column is named "alt_key". Is "keycol" always the character
representation of "alt_key"? Are they unrelated?

It would also help to know how the data in this table changes. Do you
only ever add data? Is some data removed from time to time (maybe
based on the "removed_date")?

If the table grows continually then range partitioning sounds good.
However, I think you should really make "keycol" a number type because
otherwise range partitioning will be a pain (you would need to include
the length of the string in the criterion if you want your varchar
ranges to mimic number ranges).

However, if you are deleting from time to time and hence the table
does not grow in the long run then hash partitioning might be a better
idea because then you do not need to create new partitions all the
time. Example on alt_key

create table problemchild (
keycol varchar(100),
blob_data bytea,
removed_date timestamp with time zone,
alt_key bigint primary key
);
create table problemchild_00 (
check ( alt_key % 16 = 0 )
) inherits (problemchild);
create table problemchild_01 (
check ( alt_key % 16 = 1 )
) inherits (problemchild);
create table problemchild_02 (
check ( alt_key % 16 = 2 )
) inherits (problemchild);
...

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-06-10 23:47:41 Re: pg_dump and thousands of schemas
Previous Message Mark Thornton 2012-06-10 08:20:26 Performance of CLUSTER