Re: Make relfile tombstone files conditional on WAL level

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make relfile tombstone files conditional on WAL level
Date: 2022-01-06 07:42:29
Message-ID: CAFiTN-uP2Bw3fZ7zgAEzreNG7uvAts4Dwg1_YxtMmprZ95gg2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 6, 2022 at 3:07 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Aug 2, 2021 at 6:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I guess there's a somewhat hacky way to get somewhere without actually
> > increasing the size. We could take 3 bytes from the fork number and use that
> > to get to a 7 byte relfilenode portion. 7 bytes are probably enough for
> > everyone.
> >
> > It's not like we can use those bytes in a useful way, due to alignment
> > requirements. Declaring that the high 7 bytes are for the relNode portion and
> > the low byte for the fork would still allow efficient comparisons and doesn't
> > seem too ugly.
>
> I think this idea is worth more consideration. It seems like 2^56
> relfilenodes ought to be enough for anyone, recalling that you can
> only ever have 2^64 bytes of WAL. So if we do this, we can eliminate a
> bunch of code that is there to guard against relfilenodes being
> reused. In particular, we can remove the code that leaves a 0-length
> tombstone file around until the next checkpoint to guard against
> relfilenode reuse.

+1

>
> I think this would also solve a problem Dilip mentioned to me today:
> suppose you make ALTER DATABASE SET TABLESPACE WAL-logged, as he's
> been trying to do. Then suppose you do "ALTER DATABASE foo SET
> TABLESPACE used_recently_but_not_any_more". You might get an error
> complaining that “some relations of database \“%s\” are already in
> tablespace \“%s\“” because there could be tombstone files in that
> database. With this combination of changes, you could just use the
> barrier mechanism from https://commitfest.postgresql.org/36/2962/ to
> wait for those files to disappear, because they've got to be
> previously-unliked files that Windows is still returning because
> they're still opening -- or else they could be a sign of a corrupted
> database, but there are no other possibilities.

Yes, this approach will solve the problem for the WAL-logged ALTER
DATABASE we are facing.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-06 07:46:48 Re: Support tab completion for upper character inputs in psql
Previous Message Kyotaro Horiguchi 2022-01-06 07:39:21 Re: In-placre persistance change of a relation