Re: Dynamic Partitioning using Segment Visibility Maps

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dynamic Partitioning using Segment Visibility Maps
Date: 2008-01-09 16:47:31
Message-ID: 60abnfymho.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

simon(at)2ndquadrant(dot)com (Simon Riggs) writes:
> I think we have an opportunity to bypass the legacy-of-thought that
> Oracle has left us and implement something more usable.

This seems like a *very* good thing to me, from a couple of
perspectives.

1. I think you're right on in terms of the issue of the cost of
"running all that DDL" in managing partitioning schemes.

When I was working as DBA, I was decidedly *NOT* interested in
doing a lot of low level partition management work, and those that
are in that role now would, I'm quite sure, agree that they are
not keen on spending a lot of their time trying to figure out what
tablespace to shift a particular table into, or what tablespace
filesystem to get sysadmins to set up.

2. Blindly following what Oracle does has always been a dangerous
sort of thing to do.

There are two typical risks:

a) There's always the worry that they may have patented some
part of how they implement things, and if you follow too
closely, There Be Dragons...

b) They have enough billion$ of development dollar$ and
development re$ource$ that they can follow strategies that
are too expensive for us to even try to follow.

3. If, rather than blindly following, we create something at least
quasi-new, there is the chance of doing fundamentally better.

This very thing happened when it was discovered that IBM had a
patent on the ARC cacheing scheme; the "clock" system that emerged
was a lot better than ARC ever was.

> One major advantage of the dynamic approach is that it can work on
> multiple dimensions simultaneously, which isn't possible with
> declarative partitioning. For example if you have a table of Orders then
> you will be able to benefit from Segment Exclusion on all of these
> columns, rather than just one of them: OrderId, OrderDate,
> RequiredByDate, LastModifiedDate. This will result in some "sloppiness"
> in the partitioning, e.g. if we fill 1 partition a day of Orders, then
> the OrderId and OrderData columns will start out perfectly arranged. Any
> particular RequiredByDate will probably be spread out over 7 partitions,
> but thats way better than being spread out over 365+ partitions.

I think it's worth observing both the advantages and demerits of this
together.

In effect, with the dynamic approach, Segment Exclusion provides its
benefits as an emergent property of the patterns of how INSERTs get
drawn into segments.

The tendancy will correspondly be that Segment Exclusion will be able
to provide useful constraints for those patterns that can naturally
emerge from the INSERTs.

We can therefore expect useful constraints for attributes that are
assigned in some kind of more or less chronological order. Such
attributes will include:

- Object ID, if set by a sequence
- Processing dates

There may be a bit of sloppiness, but the constraints may still be
useful enough to exclude enough segments to improve efficiency.

_On The Other Hand_, there will be attributes that are *NOT* set in a
more-or-less chronological order, and Segment Exclusion will be pretty
useless for these attributes.

In order to do any sort of "Exclusion" for non-"chronological"
attributes, it will be necessary to use some mechanism other than the
patterns that fall out of "natural chronological insertions." If you
want exclusion on such attributes, then there needs to be some sort of
rule system to spread such items across additional partitions. Mind
you, if you do such, that will weaken the usefulness of Segment
Exclusion. For instance, suppose you have 4 regions, and scatter
insertions by region. In that case, there will be more segments that
overlap any given chronological range.

> When we look at the data in the partition we can look at any number of
> columns. When we declaratively partition, you get only one connected set
> of columns, which is one of the the reasons you want multi-dimensional
> partitioning in the first place.

Upside: Yes, you get to exclude based on examining any number of
columns.

Downside: You only get the exclusions that are "emergent properties"
of the data...

The more I'm looking at the dynamic approach, the more I'm liking
it...
--
"cbbrowne","@","cbbrowne.com"
http://linuxfinances.info/info/linuxxian.html
"Feel free to contribute build files. Or work on your motivational
skills, and maybe someone somewhere will write them for you..."
-- "Fredrik Lundh" <effbot(at)telia(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-01-09 16:47:51 Re: OUTER JOIN performance regression remains in 8.3beta4
Previous Message Markus Schiltknecht 2008-01-09 16:40:05 Re: Some ideas about Vacuum