Re: Large files for relations

From: Jim Mlodgenski <jimmy76(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Large files for relations
Date: 2023-05-12 13:30:57
Message-ID: CAB_5SReGK4FhMkb+wjY0umy8AUDfYQ7UUwFjGN9-M+aGsm+E-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 11, 2023 at 7:38 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Fri, May 12, 2023 at 8:16 AM Jim Mlodgenski <jimmy76(at)gmail(dot)com> wrote:
> > On Mon, May 1, 2023 at 9:29 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> wrote:
> >> I am not aware of any modern/non-historic filesystem[2] that can't do
> >> large files with ease. Anyone know of anything to worry about on that
> >> front?
> >
> > There is some trouble in the ambiguity of what we mean by "modern" and
> "large files". There are still a large number of users of ext4 where the
> max file size is 16TB. Switching to a single large file per relation would
> effectively cut the max table size in half for those users. How would a
> user with say a 20TB table running on ext4 be impacted by this change?
>
> Hrmph. Yeah, that might be a bit of a problem. I see it discussed in
> various places that MySQL/InnoDB can't have tables bigger than 16TB on
> ext4 because of this, when it's in its default one-file-per-object
> mode (as opposed to its big-tablespace-files-to-hold-all-the-objects
> mode like DB2, Oracle etc, in which case I think you can have multiple
> 16TB segment files and get past that ext4 limit). It's frustrating
> because 16TB is still really, really big and you probably should be
> using partitions, or more partitions, to avoid all kinds of other
> scalability problems at that size. But however hypothetical the
> scenario might be, it should work,
>

Agreed, it is frustrating, but it is not hypothetical. I have seen a number
of
users having single tables larger than 16TB and don't use partitioning
because
of the limitations we have today. The most common reason is needing multiple
unique constraints on the table that don't include the partition key.
Something
like a user_id and email. There are workarounds for those cases, but usually
it's easier to deal with a single large table than to deal with the sharp
edges
those workarounds introduce.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-05-12 13:47:23 Re: Adding SHOW CREATE TABLE
Previous Message Tom Lane 2023-05-12 13:26:27 Re: psql tests hangs