Re: Storage for multiple variable-length attributes in a single row

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Esteban Zimanyi <estebanzimanyi(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>
Subject: Re: Storage for multiple variable-length attributes in a single row
Date: 2022-02-07 17:19:48
Message-ID: 20220207171948.i7v4hn5zv4c6gr5f@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 07, 2022 at 10:10:53AM -0700, David G. Johnston wrote:
> On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi <estebanzimanyi(at)gmail(dot)com>
> wrote:
>
> >
> > As suggested by David, this goes beyond the "traditional" usage of
> > PostgreSQL. Therefore my questions are
> > * What is the suggested strategy to splitting these 2K attributes into
> > vertically partitioned tables where the tables are linked by the primary
> > key (e.g. trip number in the example above). Are there any limitations/best
> > practices in the number/size of TOASTED attributes that a table should
> > contain.
> > * In each partitioned table containing N TOASTED attributes, given the
> > above requirements, are there any limitations/best practices in storing
> > them using extended storage or an alternative one such as external.
> >
> >
> Frankly, the best practice is "don't have that many columns". Since you
> do, I posit that you are just going to have to make decisions (possibly
> with experimentation) on your own. Or maybe ask around on a MobilityDB
> forum what people using that tool and having these kinds of data structures
> do. From a core PostgreSQL perspective you've already deviated from the
> model structures that it was designed with in mind.
> I'm really confused that you'd want the data value itself to contain a
> timestamp that, on a per-row basis, should be the same timestamp that every
> other value on the row has. Extracting the timestamp to it own column and
> using simpler and atomic data types is how core PostgreSQL and the
> relational model normalization recommend dealing with this situation. Then
> you just break up the attributes of a similar nature into their own tables
> based upon their shared nature. In almost all cases relying on "main"
> storage.

Actually looking at the original example:

> CREATE TYPE tint (
> internallength = variable,
> [...]
> storage = extended,
> alignment = double,
> [...]
> );

I'm wondering if it's just some miscommunication here. If the tint data type
only needs to hold a timestamp and an int, I don't see why it would be
varlerna at all.

So if a single tint can internally hold thousands of (int, timestamptz), a bit
like pgpointcloud, then having it by default external (so both possibly
out-of-line and compressed) seems like a good idea, as you can definitely hit
the 8k boundary, it should compress nicely and you also avoid some quite high
tuple header overhead.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-07 17:20:50 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message David G. Johnston 2022-02-07 17:10:53 Re: Storage for multiple variable-length attributes in a single row