Re: database size growing continously

From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: database size growing continously
Date: 2009-10-30 20:18:45
Message-ID: 4AEB4A25.7080701@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/30/2009 08:01 PM, Greg Stark wrote:
> On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu<fotographs(at)gmail(dot)com> wrote:
>> Any relational database worth its salt has partitioning for a reason.
>>
>> 1. Maintenance. You will need to delete data at some
>> point.(cleanup)...Partitions are the only way to do it effectively.
>
> This is true and it's unavoidably a manual process. The database will
> not know what segments of the data you intend to load and unload en
> masse.
>
>> 2. Performance. Partitioning offer a way to query smaller slices of
>> data automatically (i.e the query optimizer will choose the partition
>> for you) ...very large tables are a no-no in any relational
>> database.(sheer size has limitations)
>
> This I dispute. Databases are designed to be scalable and very large
> tables should perform just as well as smaller tables.
>
> Where partitions win for performance is when you know something about
> how your data is accessed and you can optimize the access by
> partitioning along the same keys. For example if you're doing a
> sequential scan of just one partition or doing a merge join of two
> equivalently partitioned tables and the partitions can be sorted in
> memory.
>
> However in these cases it is possible the database will become more
> intelligent and be able to achieve the same performance gains
> automatically. Bitmap index scans should perform comparably to the
> sequential scan of individual partitions for example.
>

So, on the becoming more intelligent front: PostgreSQL already does
some operations as background maintenance (autovacuum). Extending
this to de-bloat indices does not seem conceptually impossible, nor for
the collection of table-data statistics for planner guidance (also, why
could a full-table-scan not collect stats as a side-effect?). Further out,
how about the gathering of statistics on queries to guide the automatic
creation of indices? Or to set up a partitioning scheme on a previously
monolithic table?

- Jeremy

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2009-10-30 22:16:44 Re: database size growing continously
Previous Message Anj Adu 2009-10-30 20:11:06 Re: database size growing continously