RE: Is this a problem in GenericXLogFinish()?

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Michael Paquier' <michael(at)paquier(dot)xyz>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Is this a problem in GenericXLogFinish()?
Date: 2024-04-05 06:22:58
Message-ID: OSBPR01MB25528A4CF6FF6E45B3DAFB0FF5032@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Michael,

> There is still some divergence between the code path of
> _hash_freeovflpage() and the replay in hash_xlog_squeeze_page() when
> squeezing a page: we do not set the LSN of the write buffer if
> (xlrec.ntups <= 0 && xlrec.is_prim_bucket_same_wrt &&
> !xlrec.is_prev_bucket_same_wrt) when generating the squeeze record,
> but at replay we call PageSetLSN() on the write buffer and mark it
> dirty in this case. Isn't that incorrect? It seems to me that we
> should be able to make the conditional depending on the state of the
> xl_hash_squeeze_page record, no?

Thanks for pointing out. Yes, we fixed a behavior by the commit aa5edbe37,
but we missed the redo case. I made a fix patch based on the suggestion [1].

Below part contained my analysis and how I reproduced.
I posted for clarifying the issue to others.

=====

## Pointed issue

Assuming the case
- xlrec.ntups is 0,
- xlrec.is_prim_bucket_same_wrt is true, and
- xlrec.is_prev_bucket_same_wrt is false.
This meant that there are several overflow pages and the tail deadpage is removing.
In this case, the primary page is not have to be modified.

While doing the normal operation, the removal is done in _hash_freeovflpage().
If above condition is met, mod_wbuf is not set to true so PageSetLSN() is skipped.

While doing the recovery, the squeeze and removal is done in hash_xlog_squeeze_page().
In this function PageSetLSN() is set unconditionally.
He said in this case the PageSetLSN should be avoided as well.

## Analysis

IIUC there is the same issue with pointed by [2].
He said the PageSetLSN() should be set when the page is really modified.
In the discussing case, wbug is not modified (just removing the tail entry) so that no need
to assign LSN to it. However, we might miss to update the redo case as well.

## How to reproduce

I could reproduce the case below steps.
1. Added the debug log like [3]
2. Constructed a physical replication.
3. Ran hash_index.sql
4. Found the added debug log.

[1]: https://www.postgresql.org/message-id/CAA4eK1%2BayneM-8mSRC0iWpDMnm39EwDoqgiOCSqrrMLcdnUnAA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/ZbyVVG_7eW3YD5-A%40paquier.xyz
[3]:
```
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -713,6 +713,11 @@ hash_xlog_squeeze_page(XLogReaderState *record)
writeopaque->hasho_nextblkno = xldata->nextblkno;
}

+ if (xldata->ntups == 0 &&
+ xldata->is_prim_bucket_same_wrt &&
+ !xldata->is_prev_bucket_same_wrt)
+ elog(LOG, "XXX no need to set PageSetLSN");
+
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
add_modbuf_flag.diff application/octet-stream 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2024-04-05 06:30:02 Re: effective_multixact_freeze_max_age issue
Previous Message Mikael Kjellström 2024-04-05 06:11:43 Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?