Re: Removing alignment padding for byval types

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing alignment padding for byval types
Date: 2019-10-31 19:24:33
Message-ID: 20191031192433.vnz3bxhmnqhh5qd4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-10-31 20:15:12 +0100, Tomas Vondra wrote:
> On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote:
> > We currently align byval types such as int4/8, float4/8, timestamp *,
> > date etc, even though we mostly don't need to. When tuples are deformed,
> > all byval types are copied out from the tuple data into the
> > corresponding Datum array, therefore the original alignment in the tuple
> > data doesn't matter. This is different from byref types, where the
> > Datum formed will often be a pointer into the tuple data.
> >
> > While there are some older systems where it could be a bit slower to
> > copy data out from unaligned positions into the datum array, this is
> > more than bought back by the next point:
> >
> >
> > The fact that these types are aligned has substantial costs:
> >
> > For one, we often waste substantial amounts of space inside tables with
> > alignment padding. It's not uncommon to see about 30% or more of space
> > wasted (especially when taking alignment of the first column into
> > account).
> >
> > For another, and this I think is less obvious, we actually waste
> > substantial amounts of CPU maintaining the alignment. This is primarily
> > the case because we have to perform to align the pointer to the next
> > field during tuple [de]forming. Those instructions [1] have to be
> > executed taking time, but what's worse, they also reduce the ability of
> > out-of-order execution to hide latencies. There's a hard dependency on
> > knowing the offset to the next column to be able to continue with the
> > next column.
> >
>
> Right. Reducing this overhead was one of the goals to allow "logical
> ordering" of columns in a table (while arbitrarily reordering the
> physical ones), but that patch got out of hand pretty quickly. Also,
> it'd still keep some of the overhead, because it was not removing the
> alignment/padding entirely.

Yea. It'd keep just about all the CPU overhead, because we'd still need
to align as soon as there is a preceding nulled or varlena colum.

There's still some benefit for logical column order, as grouping NOT
NULL fixed-length columns at the start is beneficial. And it's also
beneficial to have frequently accessed columns at the start. But I think
this proposal gains a lot of the space related benefits, at a much lower
complexity, together with a lot of other benefits.

> > There's two reasons why we can't just set the alignment for these types
> > to 'c'.
> > 1) pg_upgrade, for fairly obvious reasons
> > 2) We map catalog table rows to structs, in a *lot* of places.
> >
> >
> > It seems to me that, despite the above, it's still worth trying to
> > improve upon the current state, to benefit from reduced space and CPU
> > usage.
> >
> > As it turns out we already separate out the alignment for a type, and a
> > column, between pg_type.typalign and pg_attribute.attalign. One way to
> > tackle this would be to allow to specify, for byval types only, at
> > column creation time whether a column uses a 'struct-mappable' alignment
> > or not. When set, set attalign to pg_type.typalign for alignment, when
> > not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
> > necessary options, and by adding such options during bki processing,
> > we'd solve 1) and 2), but otherwise gain the benefits.
> >
> > Alternatively we could declare such a propert on the table level, but
> > that seems more restrictive, without a corresponding upside.

> I don't know, but it seems entirely sufficient specifying this at the
> table level, no? What would be the use case for removing padding for
> only some of the columns? I don't see the use case for that.

Well, if we had it on a per-table level, we'd also align
a) catalog table columns that follow the first varlena column - which we don't need
to align, as they can't be accessed via mapping
b) columns in pg_upgraded tables that have been added after the upgrade

> > It's possible that we should do something related with a few varlena
> > datatypes. We currently use intalign for types like text, json, and as
> > far as I can tell that does not make all that much sense. They're not
> > struct mappable *anyway* (and if they were, they'd need to be 8 byte
> > aligned on common platforms, __alignof__(void*) is 8). We'd have to take
> > a bit of care to treat the varlena header as unaligned - but we need to
> > do so anyway, because of 1byte varlenas. Short varlenas seems to make it
> > less crucial to pursue this, as the average datum that'd benefit is long
> > enough to make padding a non-issue. So I don't think it'd make sense to
> > tackle this project at the same time.
> >
>
> Not sure, but how come it's not failing on the picky platforms, then? On
> x86 it's probably OK because it's pretty permissive, but I'd expect some
> platforms (s390, parisc, itanium, powerpc, ...) to be much pickier.

I'm not quite following? As I said, we already need to use alignment
aware code due to caring for short varlenas. And these types aren't
actually struct mappable.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-10-31 19:32:47 Re: Adding percentile metrics to pg_stat_statements module
Previous Message Tomas Vondra 2019-10-31 19:21:55 Re: Proposal: Global Index