Re: Transparent table partitioning in future version of PG?

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, henk de wit <henk53602(at)hotmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Transparent table partitioning in future version of PG?
Date: 2009-05-07 17:36:58
Message-ID: C6286A4A.5B7E%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5/7/09 1:54 AM, "Simon Riggs" <simon(at)2ndQuadrant(dot)com> wrote:

>
>
> On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
>> Tom Lane wrote:
>>
>> I also expect that in the future there will be demand for striping data
>> across multiple partitions in different tablespaces to exploit
>> in-parallel scanning (when/if supported) for better I/O utilization in
>> multiple-disk-array situations. For example, partitioning on
>> "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent
>> scans, one per partition, to satisfy a query.
>
> That's a good summary. It has already been documented and discussed, but
> saying it again and again is the best way to get this across.
>
> You've highlighted that partitioning is a feature with many underlying
> requirements: infrequent access to data (frequently historical),

Actually, infrequent access is not a requirement. It is a common
requirement however.

Take for instance, a very large set of data that contains an integer column
'type_id' that has about 200 distinct values. The data is accessed with a
strict 'type_id = X' requirement 99.9% of the time. If this was one large
table, then scans of all sorts become much more expensive than if it is
partitioned on 'type_id'. Furthermore, partitioning on type_id removes the
requirement to even index on this value. Statistics on each partition may
vary significantly, and the plannner can thus adapt to changes in the data
per value of type_id naturally.

The raw need is not "infrequent access" but highly partitioned access. It
doesn't matter if your date-partitioned data is accessed evenly across all
dates or skewed to the most frequent -- it matters that you are almost
always accessing by small date ranges.

> striping for parallelism and getting around RDBMS flaws (if any). We
> must be careful to implement each requirement in full, yet separately,
> so we don't end up with 60% functionality in each case by delivering an
> average or least common denominator solution.
>
> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Training, Services and Support
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dimitri 2009-05-07 18:36:56 Re: Any better plan for this query?..
Previous Message David Blewett 2009-05-07 17:28:14 Re: Bad Plan for Questionnaire-Type Query