Re: Large files for relations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Large files for relations
Date: 2023-05-15 17:55:56
Message-ID: CA+TgmoY=PBJaMsV4FuWYSMmNc7EmLWN53eSS4p8SbjF6z2fdgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 12, 2023 at 9:53 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> While I tend to agree that 1GB is too small, 1TB seems like it's
> possibly going to end up on the too big side of things, or at least,
> if we aren't getting rid of the segment code then it's possibly throwing
> away the benefits we have from the smaller segments without really
> giving us all that much. Going from 1G to 10G would reduce the number
> of open file descriptors by quite a lot without having much of a net
> change on other things. 50G or 100G would reduce the FD handles further
> but starts to make us lose out a bit more on some of the nice parts of
> having multiple segments.

This is my view as well, more or less. I don't really like our current
handling of relation segments; we know it has bugs, and making it
non-buggy feels difficult. And there are performance issues as well --
file descriptor consumption, for sure, but also probably that crossing
a file boundary likely breaks the operating system's ability to do
readahead to some degree. However, I think we're going to find that
moving to a system where we have just one file per relation fork and
that file can be arbitrarily large is not fantastic, either. Jim's
point about running into filesystem limits is a good one (hi Jim, long
time no see!) and the problem he points out with ext4 is almost
certainly not the only one. It doesn't just have to be filesystems,
either. It could be a limitation of an archiving tool (tar, zip, cpio)
or a file copy utility or whatever as well. A quick Google search
suggests that most such things have been updated to use 64-bit sizes,
but my point is that the set of things that can potentially cause
problems is broader than just the filesystem. Furthermore, even when
there's no hard limit at play, a smaller file size can occasionally be
*convenient*, as in Pavel's example of using hard links to share
storage between backups. From that point of view, a 16GB or 64GB or
256GB file size limit seems more convenient than no limit and more
convenient than a large limit like 1TB.

However, the bugs are the flies in the ointment (ahem). If we just
make the segment size bigger but don't get rid of segments altogether,
then we still have to fix the bugs that can occur when you do have
multiple segments. I think part of Thomas's motivation is to dodge
that whole category of problems. If we gradually deprecate
multi-segment mode in favor of single-file-per-relation-fork, then the
fact that the segment handling code has bugs becomes progressively
less relevant. While that does make some sense, I'm not sure I really
agree with the approach. The problem is that we're trading problems
that we at least theoretically can fix somehow by hitting our code
with a big enough hammer for an unknown set of problems that stem from
limitations of software we don't control, maybe don't even know about.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-05-15 17:59:51 Re: cutting down the TODO list thread
Previous Message Drouvot, Bertrand 2023-05-15 16:45:23 Re: Autogenerate some wait events code and documentation