Re: A varint implementation for PG?

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A varint implementation for PG?
Date: 2019-12-13 05:31:55
Message-ID: CAMsr+YGCU+CP3A+XLsc7wv+-=z4+4GN1RpX+D0Ua48q6c1Jc_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 10 Dec 2019 at 09:51, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> I several times, most recently for the record format in the undo
> patchset, wished for a fast variable width integer implementation for
> postgres. Using very narrow integers, for space efficiency, solves the
> space usage problem, but leads to extensibility / generality problems.
>

Yes. I've wanted flexible but efficiently packed integers quite a bit too,
especially when working with wire protocols.

Am I stabbing completely in the dark when wondering if this might be a step
towards a way to lift the size limit on VARLENA Datums like bytea ?

There are obvious practical concerns with doing so, given that our protocol
offers no handle based lazy fetching for big VARLENA values, but that too
needs a way to represent sizes sensibly and flexibly.

> Even with those caveats, I think that's a pretty good result. Other
> encodings were more expensive. And I think there's definitely some room
> for optimization left.

I don't feel at all qualified to question your analysis of the appropriate
representation. But your explanation certainly makes a lot of sense as
someone approaching the topic mostly fresh - I've done a bit with BCD but
not much else.

I assume we'd be paying a price in padding and alignment in most cases, and
probably more memory copying, but these representations would likely be
appearing mostly in places where other costs are overwhelmingly greater
like network or disk I/O.

If data lengths longer than that are required for a use case

If baking a new variant integer format now, I think limiting it to 64 bits
is probably a mistake given how long-lived PostgreSQL is, and how hard it
can be to change things in the protocol, on disk, etc.

> it
> probably is better to either a) use the max-representable 8 byte integer
> as an indicator that the length is stored or b) sacrifice another bit to
> represent whether the integer is the data itself or the length.
>

I'd be inclined to suspect that (b) is likely worth doing. If nothing else
because not being able to represent the full range of a 64-bit integer in
the variant type is potentially going to be a seriously annoying hassle at
points where we're interacting with places that could use the full width.
We'd then have the potential for variant integers of > 2^64 but at least
that's wholly under our control.

I also routinely underestimate how truly huge a 64-bit integer really is.
But even now 8 petabytes isn't as inconceivable as it used to be....

It mostly depends on how often you expect you'd be coming up on the
boundaries where the extra bit would push you up a variant size.

Do others see use in this?

Yes. Very, very much yes.

I'd be quick to want to expose it to SQL too.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2019-12-13 05:33:49 Re: Questions about PostgreSQL implementation details
Previous Message Amit Kapila 2019-12-13 05:18:49 Re: [HACKERS] Block level parallel vacuum