Re: archive status ".ready" files may be created too early

From: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-20 17:52:10
Message-ID: 202108201752.sdsjbgmwee4x@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Aug-20, Bossart, Nathan wrote:

> On 8/20/21, 8:29 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:

> > We can't expand the hash table either. It has an initial and maximum
> > size of 16 elements, which means it's basically an expensive array,
> > and which also means that it imposes a new limit of 16 *
> > wal_segment_size on the size of WAL records. If you exceed that limit,
> > I think things just go boom... which I think is not acceptable. I
> > think we can have records in the multi-GB range of wal_level=logical
> > and someone chooses a stupid replica identity setting.
>
> I was under the impression that shared hash tables could be expanded
> as necessary, but from your note and the following comment, that does
> not seem to be true:

Actually, you were right. Hash tables in shared memory can be expanded.
There are some limitations (the hash "directory" is fixed size, which
means the hash table get less efficient if it grows too much), but you
can definitely create more hash entries than the initial size. See for
example element_alloc(), which covers the case of a hash table being
IS_PARTITIONED -- something that only shmem hash tables can be. Note
that ShmemInitHash passes the HASH_ALLOC flag and uses ShmemAllocNoError
as allocation function, which acquires memory from the shared segment.

This is a minor thing -- it doesn't affect the fact that the hash table
is possibly being misused and inefficient -- but I thought it was worth
pointing out.

As an example, consider the LOCK / PROCLOCK hash tables. These can
contain more elements than max_backends * max_locks_per_transaction.
Those elements consume shared memory from the "allocation slop" in the
shared memory segment. It's tough when it happens (as far as I know the
memory is never "returned" once such a hash table grows to use that
space), but it does work.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-08-20 18:19:21 Re: archive status ".ready" files may be created too early
Previous Message Shruthi Gowda 2021-08-20 17:36:13 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)