Re: mdclose() does not cope w/ FileClose() failure

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: mdclose() does not cope w/ FileClose() failure
Date: 2019-12-24 19:57:39
Message-ID: 20191224195739.GB1689008@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 23, 2019 at 07:41:49PM +0900, Kyotaro Horiguchi wrote:
> At Sun, 22 Dec 2019 12:21:00 -0800, Noah Misch <noah(at)leadboat(dot)com> wrote in
> > On Sun, Dec 22, 2019 at 01:19:30AM -0800, Noah Misch wrote:
> > > An alternative would be to call
> > > _fdvec_resize() after every FileClose(), like mdtruncate() does; however, the
> > > repalloc() overhead could be noticeable. (mdclose() is called much more
> > > frequently than mdtruncate().)
> >
> > I can skip repalloc() when the array length decreases, to assuage mdclose()'s
> > worry. In the mdclose() case, the final _fdvec_resize(reln, fork, 0) will
> > still pfree() the array. Array elements that mdtruncate() frees today will
> > instead persist to end of transaction. That is okay, since mdtruncate()
> > crossing more than one segment boundary is fairly infrequent. For it to
> > happen, you must either create a >2G relation and then TRUNCATE it in the same
> > transaction, or VACUUM must find >1-2G of unused space at the end of the
> > relation. I'm now inclined to do it that way, attached.
>
> * It doesn't seem worthwhile complicating the code by having a more
> * aggressive growth strategy here; the number of segments doesn't
> * grow that fast, and the memory context internally will sometimes
> - * avoid doing an actual reallocation.
> + * avoid doing an actual reallocation. Likewise, since the number of
> + * segments doesn't shrink that fast, don't shrink at all. During
> + * mdclose(), we'll pfree the array at nseg==0.
>
> If I understand it correctly, it is mentioning the number of the all
> segment files in a fork, not the length of md_seg_fds arrays at a
> certain moment. But actually _fdvec_resize is called for every segment
> opening during mdnblocks (just-after mdopen), and every segment
> closing during mdclose and mdtruncate as mentioned here. We are going
> to omit pallocs only in the decreasing case.

That is a good point. How frequently one adds 1 GiB of data is not the main
issue. mdclose() and subsequent re-opening of all segments will be more
relevant to overall performance.

> If we regard repalloc as far faster than FileOpen/FileClose or we care
> about only increase of segment number of mdopen'ed files and don't
> care the frequent resize that happens during the functions above, then
> the comment is right and we may resize the array in the
> segment-by-segment manner.

In most cases, the array will fit into a power-of-two chunk, so repalloc()
already does the right thing. Once the table has more than ~1000 segments (~1
TiB table size), the allocation will get a single-chunk block, and every
subsequent repalloc() will call realloc(). Even then, repalloc() probably is
far faster than File operations. Likely, I should just accept the extra
repalloc() calls and drop the "else if" change in _fdvec_resize().

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2019-12-24 23:14:41 Re: Implementing Incremental View Maintenance
Previous Message Fabien COELHO 2019-12-24 18:13:32 Re: Allow an alias to be attached directly to a JOIN ... USING