Re: Dynamic Partitioning using Segment Visibility Maps

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dynamic Partitioning using Segment Visibility Maps
Date: 2008-01-09 14:34:30
Message-ID: 1199889270.4266.267.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2008-01-07 at 12:14 +0100, Csaba Nagy wrote:
> On Wed, 2008-01-02 at 17:56 +0000, Simon Riggs wrote:
> > Like it?
>
> Very cool :-)

Thanks. As ever, a distillation of various thoughts, not all mine.

> One additional thought: what about a kind of "segment fill factor" ?
> Meaning: each segment has some free space reserved for future
> updates/inserts of records in the same range of it's partitioning
> constraint. And when inserting/updating you put the new record into the
> corresponding segment... just like a very coarse clustering. Then you
> could vacuum the segments separately to keep the free space not running
> out. For active segments you would then fix the partitioning constraint
> range once the fill factor is reached, to allow for keeping it's
> constraint even when heavily updating (heavily vacuuming it too as
> response to that), and create a new segment for the unbounded range for
> new inserts... this would work fine for tables where the constraint is
> based on ever increasing keys and accidental inserts in old ranges
> (which do happen occasionally in real life).

Lots of thoughts there, so I'll try to separate them out and analyse.

The way I originally described it is a very simple mechanism and we
could tweak that some more. All ideas welcome.

If we had dynamic segment constraints when the segment was not yet read
only that would lead to changes in the segment constraint for each
INSERT when we have increasing keys. It seems better to set the
constraints once only, when the segment was full, then prevent further
INSERTs. The accidental inserts in old ranges seem like something that
can be avoided if it is a real-world problem.

For UPDATEs on a segment with constraints we might choose to apply the
constraints to see what to do. You might want to allow the UPDATE and
have it stretch the constraints outwards or you might want to prevent it
and throw an error, or you might want to allow the UPDATE, yet migrate
the new tuple to the appropriate segment.

Dynamic partitioning works in multiple dimensions, so there isn't just
one single valid location for any row. i.e. if we update a have a row
with OrderId, OrderDate, RequiredByDate, ShipDate and LastModifiedDate
on it, we'll probably expand the constraints on at least one of those.
If we were lucky enough to have only changed one of those it might turn
out there was *in that case* a single more sensible location for the new
tuple, but that probably isn't the common case.

So the likely best behaviour for UPDATEs is to try to keep the new row
version in the same segment, then change the constraints.

The segment constraints concept and the read only concept were linked.
You're right we could separate them, thought that turns out not to be
that desirable. When we do an DELETE or an UPDATE we don't know whether
the deleted row version was the last tuple with that particular boundary
value. So we don't know whether the DELETE or UPDATE changes the
constraints or not, and however we try to avoid it, we'll probably need
to recalc the constraints in some circumstance. So updating the
constraints dynamically isn't anywhere near as easy as it sounds and
ultimately probably isn't worth the effort. So thats why constraints and
read-only go together.

HOT allows us to keep the new row version within the segment, in many
cases. What might be worth doing is marking the FSM space for that
segment as update-only to exclude inserts from using it, then forcing
UPDATEs to stay within the segment if possible by providing the current
block number in each call to the FSM. That would also mean that every
UPDATE that wants to do a multi-block update on a currently read-only
segment would need to call the FSM. Sounds good, but that would only
work for the first UPDATE on a segment after it is marked read only,
which isn't much use, or we would do it for *every* block-spanning
UPDATE, which would cause contention in other use cases.

So although I'm willing to listen and tweak, that hasn't yet resulted in
any additional design points, unless I missed something above?

> When the change rate of old segments get down, the segments could be
> reorganized to have a smaller fill factor, so that you still allow for
> accidental updates but keep space usage efficient. This would be some
> similar action as a clustering, but hopefully not blocking (which might
> be a hard thing to do)... and later again you could mark some of the
> really old things as read only and put them in special segments with no
> wasted space.

Yes, hopefully marking read only and compressed segments is a possible
future.

> One problem would be when the segment's free space runs out, so you must
> put records from the same constraint range in multiple segments - but
> that could still work, you just would have multiple segments covering
> the same range, but if the "segment fill factor" is chosen properly it
> should not be the case... you could normally maintain a set of
> non-overlapping segments in terms of the partitioning constraint.

Thanks very much for your detailed thoughts on the proposal.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-01-09 14:47:48 Re: Dynamic Partitioning using Segment Visibility Maps
Previous Message Markus Schiltknecht 2008-01-09 14:11:26 Re: LD_LIBRARY_PATH not honored on Debian unstable