Re: pglz performance

From: Andres Freund <andres(at)anarazel(dot)de>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-05 07:26:25
Message-ID: 20190805072625.pzebnrs3fvk5wjen@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote:
> It carries that information inside the compressed value, like I said in the
> other reply, that's why the extra byte.

I'm not convinced that that is a good plan - imo the reference to the
compressed data should carry that information.

I.e. in the case of toast, at least toast pointers should hold enough
information to determine the compression algorithm. And in the case of
WAL, the WAL record should contain that.

Consider e.g. adding support for slice fetching of datums compressed
with some algorithm - we should be able to determine whether that's
possible without fetching the datum (so we can take a non-exceptional
path for datums compressed otherwise). Similarly, for WAL, we should be
able to detect whether an incompatible compression format is used,
without having to invoke a generic compression routine that then fails
in some way. Or adding compression reporting for WAL to xlogdump.

I also don't particularly like baking in the assumption that we don't
support tuples larger than 1GB in further places. To me it seems likely
that we're going to have to fix that, and it's hard enough already... I
know that my patch did that too...

For external datums I suggest encoding the compression method as a
distinct VARTAG_ONDISK_COMPRESSED, and then have that include the
compression method as a field.

For in-line compressed values (so VARATT_IS_4B_C), doing something
roughly like you did, indicating the type of metadata following using
the high bit sounds reasonable. But I think I'd make it so that if the
highbit is set, the struct is instead entirely different, keeping a full
4 byte byte length, and including the compression type header inside the
struct. Perhaps I'd combine the compression type with the high-bit-set
part? So when the high bit is set, it'd be something like

{
int32 vl_len_; /* varlena header (do not touch directly!) */

/*
* Actually only 7 bit, the high bit determines whether this
* is the old compression header (if unset), or this type of header
* (if set).
*/
uint8 type;

/*
* Stored as uint8[4], to avoid unnecessary alignment padding.
*/
uint8[4] length;

char va_data[FLEXIBLE_ARRAY_MEMBER];
}

I think it's worth spending some effort trying to get this right - we'll
be bound by design choices for a while.

It's kinda annoying that toast datums aren't better designed. Creating
them from scratch, I'd make it something like:

1) variable-width integer describing the "physical length", so that
tuple deforming can quickly determine length - all the ifs necessary
to determine lengths are a bottleneck. I'd probably just use a 127bit
encoding, if the high bit is set, there's a following length byte.

2) type of toasted datum, probably also in a variable width encoding,
starting at 1byte. Not that I think it's likely we'd overrun 256
types - but it's cheap enough to just declare the high bit as an
length extension bit.

These are always stored unaligned. So there's no need to deal with
padding bytes having to be zero to determine whether we're dealing with
a 1byte datum etc.

Then, type dependant:

For In-line uncompressed datums
3a) alignment padding, amount determined by 2) above, i.e. we'd just
have different types for different amounts of alignment. Probably
using some heuristic to use unaligned when either dealing with data
that doesn't need alignment, or when the datum is fairly small, so
copying to get the data as unaligned won't be a significant penalty.
4a) data

For in-line compressed datums
3b) compression metadata {varint rawsize, varint compression algorithm}
4b) unaligned compressed data - there's no benefit in keeping it aligned

For External toast for uncompressed data:
3d) {toastrelid, valueid, varint rawsize}

For External toast for compressed data:
3e) {valueid, toastrelid, varint compression_algorithm, varint rawsize, varint extsize}

That'd make it a lot more extensible, easier to understand, faster to
decode in a lot of cases, remove a lot of arbitrary limits. Yes, it'd
increase the header size for small datums to two bytes, but I think
that'd be more than bought back by the other improvements.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-08-05 07:28:24 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Previous Message Daniel Gustafsson 2019-08-05 07:20:07 Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?