Re: Postgresql partitioning - single hot table or distributed

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: sam mulube <sam(dot)mulube(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql partitioning - single hot table or distributed
Date: 2010-07-02 13:59:46
Message-ID: AANLkTinvwqST2A0fzmMnV8_tddeUR4WcbLbLNVIPtSF8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jun 29, 2010 at 4:00 PM, sam mulube <sam(dot)mulube(at)gmail(dot)com> wrote:
> Hi,
>
> we are considering database partitioning as a possible solution to
> some performance issues we are having with our database, and we are
> trying to decide on a partitioning scheme. We have a moderately write
> heavy application (approx 50 inserts per second, with writes
> outnumbering reads by roughly 5:1), and the table in question looks
> something like this:
>
> -------------------------------------------------------------------------------
> column name :   id            |  value   |  server_id   |   created_at
> column type    :  integer   |  string   |    integer     |   timestamp
> with time zone
> other info         :  pk           |               | fk, indexed |
> indexed
> -------------------------------------------------------------------------------
>
> Or initial thoughts on partitioning was to partition by date using the
> created_at column, with a separate partition for each month; however
> the vast majority of our inserts would be for 'now', so we would be
> almost entirely writing to the partition for the current month. Other
> month partitions might get occasional updates, but this would be a
> relatively infrequent occurrence.
>
> Alternatively we wondered about partitioning by the server_id foreign
> key, using for example the modulo of the foreign key id. This would
> give us a finite number of partitions (rather than the potentially
> unbounded date option), and would likely cause writes to be much more
> evenly distributed between the partitions.
>
> Does anyone have any likely idea which would be the better choice. The
> single hot table getting most of the inserts, which might mean any
> indexes are fully in memory, or dividing the writes more evenly over
> all of our partitions?

Are most of your selects for now to now - 1 day or so as well? If so,
then look at having one big partition for historical data and one
small one for the last day. Every x hours run a cron job that moves
everything in the current partition to the old archive partition(s).

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Guillaume Lelarge 2010-07-02 14:08:06 Re: pgpool-II (max_pool and num_init_children)
Previous Message Merlin Moncure 2010-07-02 13:54:02 Re: change array dimension