Re: archive status ".ready" files may be created too early

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "x4mmm(at)yandex-team(dot)ru" <x4mmm(at)yandex-team(dot)ru>, "a(dot)lubennikova(at)postgrespro(dot)ru" <a(dot)lubennikova(at)postgrespro(dot)ru>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: archive status ".ready" files may be created too early
Date: 2021-08-20 19:41:28
Message-ID: 029A2429-21B1-426F-A8FE-109ADF6A8E90@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/20/21, 11:20 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 20, 2021 at 1:29 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>> Thinking about this stuff further, I was wondering if one way to
>> handle the bounded shared hash table problem would be to replace the
>> latest boundary in the map whenever it was full. But at that point,
>> do we even need a hash table? This led me to revisit the two-element
>> approach that was discussed upthread. What if we only stored the
>> earliest and latest segment boundaries at any given time? Once the
>> earliest boundary is added, it never changes until the segment is
>> flushed and it is removed. The latest boundary, however, will be
>> updated any time we register another segment. Once the earliest
>> boundary is removed, we replace it with the latest boundary. This
>> strategy could cause us to miss intermediate boundaries, but AFAICT
>> the worst case scenario is that we hold off creating .ready files a
>> bit longer than necessary.
>
> I think this is a promising approach. We could also have a small
> fixed-size array, so that we only have to risk losing track of
> anything when we overflow the array. But I guess I'm still unconvinced
> that there's a real possibility of genuinely needing multiple
> elements. Suppose we are thinking of adding a second element to the
> array (or the hash table). I feel like it's got to be safe to just
> remove the first one. If not, then apparently the WAL record that
> caused us to make the first entry isn't totally flushed yet - which I
> still think is impossible.

I've attached a patch to demonstrate what I'm thinking.

Nathan

Attachment Content-Type Size
v13-0001-Avoid-creating-archive-status-.ready-files-too-e.patch application/octet-stream 18.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-08-20 20:06:02 Re: postgres_fdw: Handle boolean comparison predicates
Previous Message Peter Geoghegan 2021-08-20 19:40:08 Re: The Free Space Map: Problems and Opportunities