Reducing tuple overhead

From: Andres Freund <andres(at)anarazel(dot)de>
To: hlinnaka(at)iki(dot)fi, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Reducing tuple overhead
Date: 2015-04-23 16:24:29
Message-ID: 20150423162429.GG3055@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Split into a new thread, the other one is already growing fast
enough. This discussion started at
http://archives.postgresql.org/message-id/55391469.5010506%40iki.fi

On April 23, 2015 6:48:57 PM GMT+03:00, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>Stop right there. You need to reserve enough space on the page to store
>
>an xmax for *every* tuple on the page. Because if you don't, what are
>you going to do when every tuple on the page is deleted by a different
>transaction.
>
>Even if you store the xmax somewhere else than the page header, you
>need
>to reserve the same amount of space for them, so it doesn't help at
>all.

Depends on how you do it and what you optimize for (disk space, runtime,
code complexity..). You can e.g. use apply a somewhat similar trick to
xmin/xmax as done to cmin/cmax; only that the data structure needs to be
persistent.

In fact, we already have combocid like structure for xids that's
persistent - multixacts. We could just have one xid saved that's either
xmin or xmax (indicated by bits) or a multixact. When a tuple is
updated/deleted whose xmin is still required we could replace the former
xmin with a multixact, otherwise just change the flag that it's now a
xmax without a xmin. To check visibility and if the xid is a multixact
we'd just have to look for the relevant member for the actual xmin and
xmax.

To avoid exessive overhead when a tuple is repeatedly updated within one
session we could store some of the data in the combocid entry that we
anyway need in that case.

Whether that's feasible complexity wise is debatable, but it's certainly
possible.

I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...

Andres

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-04-23 16:31:32 Re: Freeze avoidance of very large table.
Previous Message Stephen Frost 2015-04-23 16:17:28 Re: anole - test case sha2 fails on all branches