Re: Storage Model for Partitioning

From: Richard Huxton <dev(at)archonet(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Storage Model for Partitioning
Date: 2008-01-11 13:26:13
Message-ID: 47876E75.3040001@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote:
>
>> Is the following basically the same as option #3 (multiple RelFileNodes)?
>>
>> 1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a
>> contigous range of blocks.
>> 2. Make a table-partition (implied or explicit constraints) map to
>> multiple "chunks".
>> That would reduce fragmentation (you'd have on average 32MB's worth of
>> blocks wasted per partition) and allow for stretchy partitions at the
>> cost of an extra layer of indirection.
>>
>> For the single-partition case you'd not need to split the file of
>> course, so it would end up looking much like the current arrangement.
>
> We need to think about the "data model" of the storage layer. Space
> itself isn't the issue, its the assumptions that all of the other
> subsystems currently make about what how a table is structured, indexed,
> accessed and manipulated.

Which was why I was thinking you'd want to maintain indexes etc.
thinking in terms of a table being a contiguous set of blocks, with the
mapping to an actual on-disk block taking place below that level. (If
I've understood you).

> Currently: Table 1:M Segments
>
> Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so
> partitions are always have a maximum size. The size just changes the
> impact, doesn't change the impact of holes, max sizes etc.
> e.g. empty table with 10 partitions would be
> a) 0 bytes in 1 file
> b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks

Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers
"pre-allocated".

> e.g. table with 10 partitions each of 1.5GB would be
> a) 15 GB in 15 files

With the limitation that any given partition might contain a mix of
data-ranges (e.g. 2005 lies half in partition 2 and half in partition 3).

> b) hit max size limit of partition: ERROR

In the case of 1b, you could have a segment mapping to more than 1
partition, avoiding the error. So 2004 data is in partition 1, 2005 is
in partitions 2,3 (where 3 is half empty), 2006 is in partition 4.
However, this does mean you've got a lot of wasted block numbers. If you
were using explicit (fixed) partitioning and chose a bad set of criteria
your maximum table size could be substantially reduced.

> Option 2: Table 1:M Child Tables 1:M Segments
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
>
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files

Cross-table indexes and constraints would be useful outside of the
current scenario.

> Option 3: Table 1:M Nodes 1:M Segments
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
>
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files

Ah, so this does seem to be roughly the same as I was rambling about.
This would presumably mean that rather than (table, block #) specifying
the location of a row you'd need (table, node #, block #).

> So 1b) seems definitely out.
>
> The implications of 2 and 3 are what I'm worried about, which is why the
> shortcomings of 1a) seem acceptable currently.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2008-01-11 15:16:12 scan.l: check_escape_warning()
Previous Message Simon Riggs 2008-01-11 12:40:21 Re: Dynamic Partitioning using Segment Visibility Maps