Re: Dynamic Partitioning using Segment Visibility Maps

From: Richard Huxton <dev(at)archonet(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dynamic Partitioning using Segment Visibility Maps
Date: 2008-01-04 10:59:43
Message-ID: 477E119F.1090102@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2008-01-04 at 10:22 +0000, Richard Huxton wrote:
>> Simon Riggs wrote:
>>> We would keep a dynamic visibility map at *segment* level, showing which
>>> segments have all rows as 100% visible. No freespace map data would be
>>> held at this level.
>> Small dumb-user question.
>>
>> I take it you've considered some more flexible consecutive-run-of-blocks
>> unit of flagging rather than file-segments. That obviously complicates
>> the tracking but means you can cope with infrequent updates as well as
>> mark most of the "most recent" segment for log-style tables.
>
> I'm writing the code to abstract that away, so yes.
>
> Now you mention it, it does seem straightforward to have a table storage
> parameter for partition size, which defaults to 1GB. The partition size
> is simply a number of consecutive blocks, as you say.
>
> The smaller the partition size the greater the overhead of managing it.

Oh, obviously, but with smaller partition sizes this also becomes useful
for low-end systems as well as high-end ones. Skipping 80% of a seq-scan
on a date-range query is a win for even small (by your standards)
tables. I shouldn't be surprised if the sensible-number-of-partitions
remained more-or-less constant as you scaled the hardware, but the
partition size grew.

> Also I've been looking at read-only tables and compression, as you may
> know. My idea was that in the future we could mark segments as either
> - read-only
> - compressed
> - able to be shipped off to hierarchical storage
>
> Those ideas work best if the partitioning is based around the physical
> file sizes we use for segments.

I can see why you've chosen file segments. It certainly makes things easier.

Hmm - thinking about the date-range scenario above, it occurs to me that
for seq-scan purposes the correct partition size depends upon the
data value you are interested in. What I want to know is what blocks Jan
07 covers (or rather what blocks it doesn't) rather than knowing blocks
1-9999999 cover 2005-04-12 to 2007-10-13. Of course that means that
you'd eventually want different partition sizes tracking visibility for
different columns (e.g. id, timestamp).

I suspect the same would be true for read-only/compressed/archived
flags, but I can see how they are tightly linked to physical files
(particularly the last two).

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-01-04 12:20:26 Re: timestamp typedefs
Previous Message Glyn Astill 2008-01-04 10:50:20 Problem with PgTcl auditing function on trigger