Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-01-26 19:13:57
Message-ID: 68120830-3A34-4C4F-942F-6739DAA664CF@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/17/20, 9:15 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Thu, 17 Dec 2020 22:20:35 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
>> On 12/15/20, 2:33 AM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>> > You're right in that regard. There's a window where partial record is
>> > written when write location passes F0 after insertion location passes
>> > F1. However, remembering all spanning records seems overkilling to me.
>>
>> I'm curious why you feel that recording all cross-segment records is
>> overkill. IMO it seems far simpler to just do that rather than try to
>
> Sorry, my words are not enough. Remembering all spanning records in
> *shared memory* seems to be overkilling. Much more if it is stored in
> shared hash table. Even though it rarely the case, it can fail hard
> way when reaching the limit. If we could do well by remembering just
> two locations, we wouldn't need to worry about such a limitation.

I don't think it will fail if we reach max_size for the hash table.
The comment above ShmemInitHash() has this note:

* max_size is the estimated maximum number of hashtable entries. This is
* not a hard limit, but the access efficiency will degrade if it is
* exceeded substantially (since it's used to compute directory size and
* the hash table buckets will get overfull).

> Another concern about the concrete patch:
>
> NotifySegmentsReadyForArchive() searches the shared hashacquiaing a
> LWLock every time XLogWrite is called while segment archive is being
> held off. I don't think it is acceptable and I think it could be a
> problem when many backends are competing on WAL.

This is a fair point. I did some benchmarking with a few hundred
connections all doing writes, and I was not able to discern any
noticeable performance impact. My guess is that contention on this
new lock is unlikely because callers of XLogWrite() must already hold
WALWriteLock. Otherwise, I believe we only acquire ArchNotifyLock no
more than once per segment to record new record boundaries.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-01-26 19:31:04 Re: archive status ".ready" files may be created too early
Previous Message Finnerty, Jim 2021-01-26 19:06:57 Re: Challenges preventing us moving to 64 bit transaction id (XID)?