Re: Streaming I/O, vectored I/O (WIP)

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Streaming I/O, vectored I/O (WIP)
Date: 2023-12-09 09:23:00
Message-ID: 4533e76e-9519-4715-acd0-d4fa552619b0@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/12/2023 02:41, Thomas Munro wrote:
> On Sat, Dec 9, 2023 at 7:25 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2023-11-30 13:01:46 +1300, Thomas Munro wrote:
>>> On Thu, Nov 30, 2023 at 12:16 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>>> Maybe we should bite the bullet and always retry short writes in
>>>> FileWriteV(). Is that what you meant by "handling them"?
>>>> If the total size is expensive to calculate, how about passing it as an
>>>> extra argument? Presumably it is cheap for the callers to calculate at
>>>> the same time that they build the iovec array?
>
>>> There is another problem with pushing it down to fd.c, though.
>>> Suppose you try to write 8192 bytes, and the kernel says "you wrote
>>> 4096 bytes" so your loop goes around again with the second half the
>>> data and now the kernel says "-1, ENOSPC". What are you going to do?
>>> fd.c doesn't raise errors for I/O failure, it fails with -1 and errno,
>>> so you'd either have to return -1, ENOSPC (converting short writes
>>> into actual errors, a lie because you did write some data), or return
>>> 4096 (and possibly also set errno = ENOSPC as we have always done).
>>> So you can't really handle this problem at this level, can you?
>>> Unless you decide that fd.c should get into the business of raising
>>> errors for I/O failures, which would be a bit of a departure.
>>>
>>> That's why I did the retry higher up in md.c.
>>
>> I think that's the right call. I think for AIO we can't do retry handling
>> purely in fd.c, or at least it'd be quite awkward. It doesn't seem like it'd
>> buy us that much in md.c anyway, we still need to handle the cross segment
>> case and such, from what I can tell?
>
> Heikki, what do you think about this: we could go with the v3 fd.c
> and md.c patches, but move adjust_iovec_for_partial_transfer() into
> src/common/file_utils.c, so that at least that slightly annoying part
> of the job is available for re-use by future code that faces the same
> problem?

Ok, works for me.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2023-12-09 11:32:22 Why are wal_keep_size, max_slot_wal_keep_size requiring server restart?
Previous Message Junwang Zhao 2023-12-09 08:39:11 Re: Make COPY format extendable: Extract COPY TO format implementations