Re: On columnar storage

From: Michael Nolan <htfoot(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On columnar storage
Date: 2015-06-14 22:35:01
Message-ID: CAOzAquJGTzR6vSbsiZXBys2OKZdLRDYk3kqp+Dp+Bko8SeyOAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 14, 2015 at 10:30 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:

>
>> Are you looking to avoid all hardware-based limits, or would using a 64
>> bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique
>> rows over whatever granularity/uniqueness you use (per table, per
>> database, etc.)
>> --
>> Mike Nolan.
>>
>
> I don't think the number of tuples is the main problem here, it's the
> number of pages a single relation can have. Looking at the numbers of rows
> as a direct function of TID size is misleading, because the TID is split
> into two fixed parts - page number (32b) and tuple number (16b).
>
> For the record, 2^48 is 281,474,976,710,656 which ought to be enough for
> anybody, but we waste large part of that because we assume there might be
> up to 2^16 tuples per page, although the actual limit is way lower (~290
> for 8kB pages, and ~1200 for 32kB pages.
>
> So we can only have ~4 billion pages, which is where the 32TB limit comes
> from (with 32kB pages it's 128TB).
>
> Longer TIDs are one a straightforward way to work around this limit,
> assuming you add the bits to the 'page number' field. Adding 16 bits (thus
> using 64-bit pointers) would increase the limit 2^16-times to about 2048
> petabytes (with 8kB pages). But that of course comes with a cost, because
> you have to keep those larger TIDs in indexes etc.
>
> Another option might be to split the 48 bits differently, by moving 5 bits
> to the page number part of TID (so that we expect ~2048 tuples per page at
> most). That'd increase the limit to 1PB (4PB with 32kB pages).
>
> The column store approach is somehow orthogonal to this, because it splits
> the table vertically into multiple pieces, each stored in a separate
> relfilenode and thus using a separate sequence of page numbers.
>
> And of course, the usual 'horizontal' partitioning has a very similar
> effect (separate filenodes).
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com/
>
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
> Thanks for the reply. It's been a while since my last data structures
course (1971), but I do remember a few things. I have never personally
needed more than 1500 columns in a table, but can see how some might.
Likewise, the 32TB limit hasn't affected me yet, either. I doubt either
ever will.

Solving either or both of those seems like it may at some point require a
larger bit space for (at least some) TIDs, which is why I was wondering if
a goal here is to eliminate all (practical) limits,

It probably doesn't make sense to force all users to use that large bit
space (with the associated space and performance penalties) If there's a
way to do this, then you are all truly wizards. (This all reminds me of how
the IP4 bit space was parcelled up into Class A, B, C and D addresses, at a
time when people thought 32 bits would last us forever. Maybe 128 bits
actually will.)
--
Mike Nolan

>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-06-14 22:50:08 creating extension including dependencies
Previous Message Tomas Vondra 2015-06-14 22:28:38 Re: Memory Accounting v11