Re: On columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: On columnar storage
Date: 2015-06-14 15:30:10
Message-ID: 557D9E02.6030605@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 06/13/15 00:07, Michael Nolan wrote:
>
>
> On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com <mailto:alvherre(at)2ndquadrant(dot)com>> wrote:
>
> We hope to have a chance to discuss this during the upcoming developer
> unconference in Ottawa. Here are some preliminary ideas to shed some
> light on what we're trying to do.
>
>
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres. Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
> Are you looking to avoid all hardware-based limits, or would using a 64
> bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique
> rows over whatever granularity/uniqueness you use (per table, per
> database, etc.)
> --
> Mike Nolan.

I don't think the number of tuples is the main problem here, it's the
number of pages a single relation can have. Looking at the numbers of
rows as a direct function of TID size is misleading, because the TID is
split into two fixed parts - page number (32b) and tuple number (16b).

For the record, 2^48 is 281,474,976,710,656 which ought to be enough for
anybody, but we waste large part of that because we assume there might
be up to 2^16 tuples per page, although the actual limit is way lower
(~290 for 8kB pages, and ~1200 for 32kB pages.

So we can only have ~4 billion pages, which is where the 32TB limit
comes from (with 32kB pages it's 128TB).

Longer TIDs are one a straightforward way to work around this limit,
assuming you add the bits to the 'page number' field. Adding 16 bits
(thus using 64-bit pointers) would increase the limit 2^16-times to
about 2048 petabytes (with 8kB pages). But that of course comes with a
cost, because you have to keep those larger TIDs in indexes etc.

Another option might be to split the 48 bits differently, by moving 5
bits to the page number part of TID (so that we expect ~2048 tuples per
page at most). That'd increase the limit to 1PB (4PB with 32kB pages).

The column store approach is somehow orthogonal to this, because it
splits the table vertically into multiple pieces, each stored in a
separate relfilenode and thus using a separate sequence of page numbers.

And of course, the usual 'horizontal' partitioning has a very similar
effect (separate filenodes).

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-06-14 15:31:25 Re: Entities created in one query not available in another in extended protocol
Previous Message Petr Jelinek 2015-06-14 15:29:49 pg_resetsysid