Re: Save a few bytes in pg_attribute

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Save a few bytes in pg_attribute
Date: 2023-03-21 22:22:40
Message-ID: CAEze2WgAsR6uN+83UaaZ_GOg8dOcnkOnSgA1TY+29i=OwJUjvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 21 Mar 2023 at 23:05, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-03-21 21:02:08 +0100, Matthias van de Meent wrote:
> > On Tue, 21 Mar 2023 at 20:58, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2023-03-21 20:20:40 +0100, Matthias van de Meent wrote:
> > > > Yes, attcacheoff is a tremendous performance boon in many cases.
> > >
> > > Which? We don't use fastgetattr() in many places these days. And in some quick
> > > measurements it's a wash or small loss when deforming slot tuples, even when
> > > the attcacheoff optimization would apply, because the branches for managing it
> > > add more overhead than they safe.
> >
> > My experience with attcacheoff performance is in indexes, specifically
> > index_getattr(). Sure, multi-column indexes are uncommon, but the
> > difference between have and have-not for cached attribute offsets is
> > several %.
>
> I did indeed not think of index_getattr(), just heap related things.
>
> Do you have a good test workload handy - I'm kinda curious to compare the cost
> of removing attcacheoff vs the gain of not maintaining it for index workloads.

Rebuilding indexes has been my go-to workload for comparing
attribute-related btree performance optimizations in [0] and [1].
Results of tests from '21 in which we're always calculating offsets
from 0 show a slowdown of 4-18% in attcacheoff-enabled workloads if
we're calculating offsets dynamically.

> It looks like many of the index_getattr() cases could be made faster without
> attcacheoff. A lot of places seem to loop over all attributes, and the key to
> accelerating that is to keep state between the iterations.

Indeed, it's not great. You can take a look at [1], which is where I'm
trying to optimize btree's handling of comparing tuples; which
includes work on reducing overhead for attribute accesses.

Note that each btree page should be able to do with comparing at most
2*log(ntups) columns, where this is currently natts * log(ntups).

> Attcacheoff is
> that, but quite stunted, because it only works if there aren't any NULLs (even
> if the NULL is in a later column).

Yes, that isn't great either, but most indexes I've seen have tuples
that are either all NULL, or have no nulls; only seldom I see indexes
that have mixed NULL/not-null index tuple attributes.

Kind regards,

Matthias van de Meent.

[0] https://www.postgresql.org/message-id/flat/CAEze2WhyBT2bKZRdj_U0KS2Sbewa1XoO_BzgpzLC09sa5LUROg%40mail.gmail.com#fe3369c4e202a7ed468e47bf5420f530
[1] https://www.postgresql.org/message-id/flat/CAEze2Wg52tsSWA9Fy7OCXx-K7pPLMNxA_fmQ6-+_pzR-AoODDA(at)mail(dot)gmail(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-03-21 22:37:12 Re: Show various offset arrays for heap WAL records
Previous Message Brar Piening 2023-03-21 22:16:25 Re: doc: add missing "id" attributes to extension packaging page