Re: problems with making relfilenodes 56-bits

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, David Rowley <dgrowleyml(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: problems with making relfilenodes 56-bits
Date: 2022-10-12 21:13:31
Message-ID: 20221012211331.nhnhjdwitzfuygjz@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-10-12 22:05:30 +0200, Matthias van de Meent wrote:
> On Wed, 5 Oct 2022 at 01:50, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-10-03 10:01:25 -0700, Andres Freund wrote:
> > > On 2022-10-03 08:12:39 -0400, Robert Haas wrote:
> > > > On Fri, Sep 30, 2022 at 8:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > I thought about trying to buy back some space elsewhere, and I think
> > > > that would be a reasonable approach to getting this committed if we
> > > > could find a way to do it. However, I don't see a terribly obvious way
> > > > of making it happen.
> > >
> > > I think there's plenty potential...
> >
> > I light dusted off my old varint implementation from [1] and converted the
> > RelFileLocator and BlockNumber from fixed width integers to varint ones. This
> > isn't meant as a serious patch, but an experiment to see if this is a path
> > worth pursuing.
> >
> > A run of installcheck in a cluster with autovacuum=off, full_page_writes=off
> > (for increased reproducability) shows a decent saving:
> >
> > master: 241106544 - 230 MB
> > varint: 227858640 - 217 MB
>
> I think a signficant part of this improvement comes from the premise
> of starting with a fresh database. tablespace OID will indeed most
> likely be low, but database OID may very well be linearly distributed
> if concurrent workloads in the cluster include updating (potentially
> unlogged) TOASTed columns and the databases are not created in one
> "big bang" but over the lifetime of the cluster. In that case DBOID
> will consume 5B for a significant fraction of databases (anything with
> OID >=2^28).
>
> My point being: I don't think that we should have different WAL
> performance in databases which is dependent on which OID was assigned
> to that database.

To me this is raising the bar to an absurd level. Some minor space usage
increase after oid wraparound and for very large block numbers isn't a huge
issue - if you're in that situation you already have a huge amount of wal.

> 0002 - Rework XLogRecord
> This makes many fields in the xlog header optional, reducing the size
> of many xlog records by several bytes. This implements the design I
> shared in my earlier message [1].
>
> 0003 - Rework XLogRecordBlockHeader.
> This patch could be applied on current head, and saves some bytes in
> per-block data. It potentially saves some bytes per registered
> block/buffer in the WAL record (max 2 bytes for the first block, after
> that up to 3). See the patch's commit message in the patch for
> detailed information.

The amount of complexity these two introduce seems quite substantial to
me. Both from an maintenance and a runtime perspective. I think we'd be better
off using building blocks like variable lengths encoded values than open
coding it in many places.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-10-12 22:07:33 Re: create subscription - improved warning message
Previous Message Nathan Bossart 2022-10-12 20:59:37 Re: allowing for control over SET ROLE