Re: BRIN summarization vs. WAL logging

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN summarization vs. WAL logging
Date: 2022-01-26 18:14:24
Message-ID: 202201261814.dhvn35vzkj5i@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022-Jan-26, Robert Haas wrote:

> On Tue, Jan 25, 2022 at 10:12 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

> > 2) brin_summarize_range()
> >
> > Now, the issue I think is more serious, more likely to happen, and
> > harder to fix. When summarizing a range, we write two WAL records:
> >
> > INSERT heapBlk 2 pagesPerRange 2 offnum 2, blkref #0: rel 1663/63 ...
> > SAMEPAGE_UPDATE offnum 2, blkref #0: rel 1663/63341/73957 blk 2
> >
> > So, what happens if we lost the second WAL record, e.g. due to a crash?
>
> Ouch. As you say, XLogFlush() won't fix that. The WAL logging scheme
> needs to be redesigned.

I'm not sure what a good fix is. I was thinking that maybe if a
placeholder tuple is found during index scan, and the corresponding
process is no longer running, then the index scanner would remove the
placeholder tuple, making the range unsummarized again. However, how
would we know that the process is gone?

Another idea is to use WAL's rm_cleanup: during replay, remember that a
placeholder tuple was seen, then remove the info if we see an update
from the originating process that replaces the placeholder tuple with a
real one; at cleanup time, if the list of remembered placeholder tuples
is nonempty, remove them.

(I vaguely recall we used the WAL rm_cleanup mechanism for something
like this, but we no longer do AFAICS.)

... Oh, but if there is a promotion involved, we may end up with a
placeholder insertion before the promotion and the update afterwards.
That would probably not be handled well.

One thing not completely clear to me is whether this only affects
placeholder tuples. Is it possible to have this problem with regular
BRIN tuples? I think it isn't.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-01-26 18:25:37 Re: Support for NSS as a libpq TLS backend
Previous Message Andres Freund 2022-01-26 17:54:34 Re: slowest tap tests - split or accelerate?