Re: Preventing free space from being reused

From: Noah Bergbauer <noah(at)statshelix(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing free space from being reused
Date: 2021-02-15 00:42:19
Message-ID: CABjy+Rikp1KFSZz3VNbLqRSm9c+PMkD_yrdAKFLxsUcpGRhNvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>I would suggest to take a look at the BRIN opclass multi-minmax currently
in development.

Thank you, this does look like it could help a lot with BRIN performance in
this situation!

But again, if index performance alone was the only issue, then I would
simply accept the space overhead and switch to btree. However, the disk
fragmentation issue still remains and is significant. It is also amplified
in my use case due to using ZFS, mostly for compression. But it is worth
it: I am currently observing a 13x compression ratio (when comparing disk
space reported by du and select sum(octet_length(x)), so this does not
include the false gains from compressing padding). But in general, any
variable-sized append-only workload suffers from this fragmentation
problem. It's just that with filesystem compression, there is no longer a
good reason to fill up those holes and accept the fragmentation.

To be clear, the main reason why I even brought my questions to this
mailing list is that I don't know how to (correctly) get past the check in
heap_getnext (see my first email) when implementing the workaround as a
custom table access method. A reloption could theoretically disable free
space maps entirely for some added efficiency, but I'm inclined to agree
that this is not really needed.

On Sat, Feb 13, 2021 at 1:36 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:

> On Fri, Feb 12, 2021 at 6:21 PM Noah Bergbauer <noah(at)statshelix(dot)com>
> wrote:
> >
> > A btree index on the same column is 700x the size of BRIN, or 10% of
> relation itself. It does not perform significantly better than BRIN. The
> issue here is twofold: not only does slotting these tuples into older pages
> significantly reduce the effectiveness of BRIN, it also causes
> fragmentation on disk. Ultimately, this is why CLUSTER exists. One way to
> look at this situation is that my data is inserted exactly in index order,
> but Postgres keeps un-clustering it for reasons that are valid in general
> (don't waste disk space) but don't apply at all in this case (the file
> system uses compression, no space is wasted).
> >
> > Any alternative ideas would of course be much appreciated! But at the
> moment HEAP_INSERT_SKIP_FSM seems like the most practical solution to me.
>
> I would suggest to take a look at the BRIN opclass multi-minmax currently
> in development. It's designed to address that exact situation, and more
> review would be welcome:
>
> https://commitfest.postgresql.org/32/2523/
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-02-15 01:14:47 Re: GlobalVisIsRemovableFullXid() vs GlobalVisCheckRemovableXid()
Previous Message Thomas Munro 2021-02-15 00:20:29 Re: GlobalVisIsRemovableFullXid() vs GlobalVisCheckRemovableXid()