Re: Memory Alignment in Postgres

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Arthur Silva <arthurprs(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory Alignment in Postgres
Date: 2014-09-10 15:43:52
Message-ID: CA+TgmobN_2_mTO_AjQ2kAXiU-LSkqGmRwmEdpH3p2C=WhzDJAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthurprs(at)gmail(dot)com> wrote:
> I'm continuously studying Postgres codebase. Hopefully I'll be able to make
> some contributions in the future.
>
> For now I'm intrigued about the extensive use of memory alignment. I'm sure
> there's some legacy and some architecture that requires it reasoning behind
> it.
>
> That aside, since it wastes space (a lot of space in some cases) there must
> be a tipping point somewhere. I'm sure one can prove aligned access is
> faster in a micro-benchmark but I'm not sure it's the case in a DBMS like
> postgres, specially in the page/rows area.
>
> Just for the sake of comparison Mysql COMPACT storage (default and
> recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed
> 4-byte alignment. Not sure about Oracle and others.
>
> Is it worth the extra space in newer architectures (specially Intel)?
> Do you guys think this is something worth looking at?

Yes. At least in my opinion, though, it's not a good project for a
beginner. If you get your changes to take effect, you'll find that a
lot of things will break in places that are not easy to find or fix.
You're getting into really low-level areas of the system that get
touched infrequently and require a lot of expertise in how things work
today to adjust.

The idea I've had before is to try to reduce the widest alignment we
ever require from 8 bytes to 4 bytes. That is, look for types with
typalign = 'd', and rewrite them to have typalign = 'i' by having them
use two 4-byte loads to load an eight-byte value. In practice, I
think this would probably save a high percentage of what can be saved,
because 8-byte alignment implies a maximum of 7 bytes of wasted space,
while 4-byte alignment implies a maximum of 3 bytes of wasted space.
And it would probably be pretty cheap, too, because any type with less
than 8 byte alignment wouldn't be affected at all, and even those
types that were affected would only be slightly slowed down by doing
two loads instead of one. In contrast, getting rid of alignment
requirements completely would save a little more space, but probably
at the cost of a lot more slowdown: any type with alignment
requirements would have to fetch the value byte-by-byte instead of
pulling the whole thing out at once.

But there are a couple of obvious problems with this idea, too, such as:

1. It's really complicated and a ton of work.
2. It would break pg_upgrade pretty darn badly unless we employed some
even-more-complex strategy to mitigate that.
3. The savings might not be enough to justify the effort.

It might be interesting for someone to develop a tool measuring the
number of bytes of alignment padding we lose per tuple or per page and
gather some statistics on it on various databases. That would give us
some sense as to the possible savings.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-09-10 15:53:03 Re: Escaping from blocked send() reprised.
Previous Message Heikki Linnakangas 2014-09-10 15:37:03 Re: pgcrypto: PGP armor headers