Re: Removing alignment padding for byval types

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing alignment padding for byval types
Date: 2019-10-31 19:45:34
Message-ID: 20191031194534.5kssk4c5uiieakpp@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 31, 2019 at 12:24:33PM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-10-31 20:15:12 +0100, Tomas Vondra wrote:
>> On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote:
>> > We currently align byval types such as int4/8, float4/8, timestamp *,
>> > date etc, even though we mostly don't need to. When tuples are deformed,
>> > all byval types are copied out from the tuple data into the
>> > corresponding Datum array, therefore the original alignment in the tuple
>> > data doesn't matter. This is different from byref types, where the
>> > Datum formed will often be a pointer into the tuple data.
>> >
>> > While there are some older systems where it could be a bit slower to
>> > copy data out from unaligned positions into the datum array, this is
>> > more than bought back by the next point:
>> >
>> >
>> > The fact that these types are aligned has substantial costs:
>> >
>> > For one, we often waste substantial amounts of space inside tables with
>> > alignment padding. It's not uncommon to see about 30% or more of space
>> > wasted (especially when taking alignment of the first column into
>> > account).
>> >
>> > For another, and this I think is less obvious, we actually waste
>> > substantial amounts of CPU maintaining the alignment. This is primarily
>> > the case because we have to perform to align the pointer to the next
>> > field during tuple [de]forming. Those instructions [1] have to be
>> > executed taking time, but what's worse, they also reduce the ability of
>> > out-of-order execution to hide latencies. There's a hard dependency on
>> > knowing the offset to the next column to be able to continue with the
>> > next column.
>> >
>>
>> Right. Reducing this overhead was one of the goals to allow "logical
>> ordering" of columns in a table (while arbitrarily reordering the
>> physical ones), but that patch got out of hand pretty quickly. Also,
>> it'd still keep some of the overhead, because it was not removing the
>> alignment/padding entirely.
>
>Yea. It'd keep just about all the CPU overhead, because we'd still need
>to align as soon as there is a preceding nulled or varlena colum.
>
>There's still some benefit for logical column order, as grouping NOT
>NULL fixed-length columns at the start is beneficial. And it's also
>beneficial to have frequently accessed columns at the start. But I think
>this proposal gains a lot of the space related benefits, at a much lower
>complexity, together with a lot of other benefits.
>

+1

>
>> > There's two reasons why we can't just set the alignment for these types
>> > to 'c'.
>> > 1) pg_upgrade, for fairly obvious reasons
>> > 2) We map catalog table rows to structs, in a *lot* of places.
>> >
>> >
>> > It seems to me that, despite the above, it's still worth trying to
>> > improve upon the current state, to benefit from reduced space and CPU
>> > usage.
>> >
>> > As it turns out we already separate out the alignment for a type, and a
>> > column, between pg_type.typalign and pg_attribute.attalign. One way to
>> > tackle this would be to allow to specify, for byval types only, at
>> > column creation time whether a column uses a 'struct-mappable' alignment
>> > or not. When set, set attalign to pg_type.typalign for alignment, when
>> > not, to 'c'. By changing pg_dump in binary upgrade mode to emit the
>> > necessary options, and by adding such options during bki processing,
>> > we'd solve 1) and 2), but otherwise gain the benefits.
>> >
>> > Alternatively we could declare such a propert on the table level, but
>> > that seems more restrictive, without a corresponding upside.
>
>> I don't know, but it seems entirely sufficient specifying this at the
>> table level, no? What would be the use case for removing padding for
>> only some of the columns? I don't see the use case for that.
>
>Well, if we had it on a per-table level, we'd also align
>a) catalog table columns that follow the first varlena column - which we don't need
> to align, as they can't be accessed via mapping
>b) columns in pg_upgraded tables that have been added after the upgrade
>

Hmm, OK. I think the question is whether it's worth the extra
complexity. I'd say it's not, but perhaps I'm wrong.

>
>> > It's possible that we should do something related with a few varlena
>> > datatypes. We currently use intalign for types like text, json, and as
>> > far as I can tell that does not make all that much sense. They're not
>> > struct mappable *anyway* (and if they were, they'd need to be 8 byte
>> > aligned on common platforms, __alignof__(void*) is 8). We'd have to take
>> > a bit of care to treat the varlena header as unaligned - but we need to
>> > do so anyway, because of 1byte varlenas. Short varlenas seems to make it
>> > less crucial to pursue this, as the average datum that'd benefit is long
>> > enough to make padding a non-issue. So I don't think it'd make sense to
>> > tackle this project at the same time.
>> >
>>
>> Not sure, but how come it's not failing on the picky platforms, then? On
>> x86 it's probably OK because it's pretty permissive, but I'd expect some
>> platforms (s390, parisc, itanium, powerpc, ...) to be much pickier.
>
>I'm not quite following? As I said, we already need to use alignment
>aware code due to caring for short varlenas. And these types aren't
>actually struct mappable.
>

Sorry, I misread/misunderstood what you wrote.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-10-31 19:50:40 Re: fe-utils - share query cancellation code
Previous Message Ibrar Ahmed 2019-10-31 19:32:47 Re: [PATCH] Implement INSERT SET syntax