Re: problems with making relfilenodes 56-bits

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: problems with making relfilenodes 56-bits
Date: 2022-10-04 23:49:52
Message-ID: 20221004234952.anrguppx5owewb6n@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-10-03 10:01:25 -0700, Andres Freund wrote:
> On 2022-10-03 08:12:39 -0400, Robert Haas wrote:
> > On Fri, Sep 30, 2022 at 8:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I thought about trying to buy back some space elsewhere, and I think
> > that would be a reasonable approach to getting this committed if we
> > could find a way to do it. However, I don't see a terribly obvious way
> > of making it happen.
>
> I think there's plenty potential...

I light dusted off my old varint implementation from [1] and converted the
RelFileLocator and BlockNumber from fixed width integers to varint ones. This
isn't meant as a serious patch, but an experiment to see if this is a path
worth pursuing.

A run of installcheck in a cluster with autovacuum=off, full_page_writes=off
(for increased reproducability) shows a decent saving:

master: 241106544 - 230 MB
varint: 227858640 - 217 MB

The average record size goes from 102.7 to 95.7 bytes excluding the remaining
FPIs, 118.1 to 111.0 including FPIs.

There's plenty other spots that could be converted (e.g. the record length
which rarely needs four bytes), this is just meant as a demonstration.

I used pg_waldump --stats for that range of WAL to measure the CPU overhead. A
profile does show pg_varint_decode_uint64(), but partially that seems to be
offset by the reduced amount of bytes to CRC. Maybe a ~2% overhead remains.

That would be tolerable, I think, because waldump --stats pretty much doesn't
do anything with the WAL.

But I suspect there's plenty of optimization potential in the varint
code. Right now it e.g. stores data as big endian, and the bswap instructions
do show up. And a lot of my bit-maskery could be optimized too.

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-Add-a-varint-implementation.patch text/x-diff 10.2 KB
v1-0002-wip-wal-Encode-RelFileLocator-and-BLockNumber-as-.patch text/x-diff 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-04 23:53:11 Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Previous Message Garen Torikian 2022-10-04 23:16:30 Re: [PATCH] Expand character set for ltree labels