AIX and EAGAIN on open()

From: Christoph Berg <christoph(dot)berg(at)credativ(dot)de>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: AIX and EAGAIN on open()
Date: 2022-06-20 09:53:20
Message-ID: YrBDkPwj5u7HMLCQ@msg.credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

a customer running PG on AIX [1] is occasionally seeing "Resource
temporarily unavailable" (EAGAIN) returned by open() calls:

[1] We have PostgreSQL 11.13 on powerpc-ibm-aix7.2.5.0, compiled by /opt/IBM/xlc/13.1.0/bin/xlc, 64-bit

2022-05-19 03:28:13 CEST:127.0.0.1(63265):x(at)x:[64029168]: ERROR: could not open file "base/16401/935915821_fsm": Resource temporarily unavailable
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x(at)x:[64029168]: CONTEXT: SQL statement "INSERT INTO s[...]"
PL/pgSQL function s...() line 12 at SQL statement
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x(at)x:[64029168]: STATEMENT: PREPARE ... AS insert into ...

2022-04-16 01:45:31 CEST:127.0.0.1(58946):x(at)x:[20906970]: ERROR: could not access status of transaction 0
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x(at)x:[20906970]: DETAIL: Could not open file "pg_subtrans/6158": Resource temporarily unavailable.
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x(at)x:[20906970]: STATEMENT: PREPARE ... AS update ...

2020-12-01 09:24:30 CET:127.0.0.1(59898):x(at)x:[6227520]: ERROR: could not access status of transaction 0
2020-12-01 09:24:30 CET:127.0.0.1(59898):x(at)x:[6227520]: DETAIL: Could not open file "pg_subtrans/AC9E": Resource temporarily unavailable.
2020-12-01 09:24:30 CET:127.0.0.1(59898):x(at)x:[6227520]: STATEMENT: PREPARE ... AS DELETE FROM ....

open() should not return EAGAIN as per POSIX [2],

[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357_05

and the AIX documentation says it would only return EAGAIN if O_TRUNC
is used [3], but as far as I can tell, PG does not use that flag.

[3] https://www.ibm.com/docs/en/aix/7.2?topic=o-open-openat-openx-openxat-open64-open64at-open64x-open64xat-creat-creat64-subroutine

IBM's reply to the issue back in December 2020 was this:

The man page / infocenter document is not intended as an exhaustive
list of all possible error codes returned and their circumstances.
"Resource temporarily unavailable" may also be returned for
O_NSHARE, O_RSHARE with O_NONBLOCK.

Afaict, PG does not use these flags either.

We also ruled out that the system is using any anti-virus or similar
tooling that would intercept IO traffic.

Does anything of that ring a bell for someone? Is that an AIX bug, a
PG bug, or something else?

Christoph
--
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt folgenden
Bestimmungen: https://www.credativ.de/datenschutz

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mahendrakar, Prabhakar - Dell Team 2022-06-20 10:51:49 RE: Postgresql error : PANIC: could not locate a valid checkpoint record
Previous Message Дмитрий Иванов 2022-06-20 05:17:08 Re: Index creation