Re: Strategies/Best Practises Handling Large Tables

From: Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To: Chitra Creta <chitracreta(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strategies/Best Practises Handling Large Tables
Date: 2012-10-12 15:10:43
Message-ID: CAM6mie+cisQsfXaPPiR7k1zPq2gjb7g1fWBPdGduUdt983yo0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 13 October 2012 01:44, Chitra Creta <chitracreta(at)gmail(dot)com> wrote:
> I currently have a table that is growing very quickly - i.e 7 million
> records in 5 days. This table acts as a placeholder for statistics, and
> hence the records are merely inserted and never updated or deleted.
>
> Many queries are run on this table to obtain trend analysis. However, these
> queries are now starting to take a very long time (hours) to execute due to
> the size of the table.

Have you changed autovacuum settings to make it more agressive?
Another options is to run analyse after loading.

>
> I have put indexes on this table, to no significant benefit. Some of the
> other strategies I have thought of:
> 1. Purge old data
> 3. Partition

Those two go together. Partitioning is useful if you can constrain
queries to specific ranges ie. this query needs last two days. You
shouldn't go over 200 - 300 partitions per table. Partition
granularity should be the same as the amount of data in average query.
if you run weekly queries then use weekly partitions (bi-weekly or
daily partitions might work but I do not have good experience).

It is easy to purge old data because you need to drop unwanted
partitions (no table/index bloat). Loading is little bit tricky
becuase you have load data into right partition.

> 4. Creation of daily, monthly, yearly summary tables that contains
> aggregated data specific to the statistics required

I think this way to the hell. You start with few tables and then you
add more tables until you realise that it takes longer to update them
then run your queries :)

You might benefit from query parallelisation, for example, pgpool-II,
Stado, Postgres XC or do not use Postgres at all. For example, any
column oriented NoSQL database might be good choice.

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)
(http://www.linkedin.com/in/ondrejivanic)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Steve Crawford 2012-10-12 15:28:52 Re: Postgres DB Migration from 8.3 to 9.1
Previous Message Amitabh Kant 2012-10-12 15:05:49 Re: Postgres DB Migration from 8.3 to 9.1