From: | Achilleas Mantzios <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com> |
---|---|
To: | Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Logical replication, need to reclaim big disk space |
Date: | 2025-05-16 19:33:29 |
Message-ID: | a085c90c-835a-4f96-aace-aedb96390b9b@cloud.gatewaynet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 16/5/25 18:45, Moreno Andreo wrote:
> Hi,
> we are moving our old binary data approach, moving them from bytea
> fields in a table to external storage (making database smaller and
> related operations faster and smarter).
> In short, we have a job that runs in background and copies data from
> the table to an external file and then sets the bytea field to NULL.
> (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = <uuid>)
>
> This results, at the end of the operations, to a table that's less
> than one tenth in size.
> We have a multi-tenant architecture (100s of schemas with identical
> architecture, all inheriting from public) and we are performing the
> task on one table per schema.
>
So? toasted data are kept on separate TOAST tables, unless those bytea
cols are selected, you won't even touch them. I cannot understand what
you are trying to achieve here.
Years ago, when I made the mistake to go for a coffee and let my
developers "improvise" , the result was a design similar to what you are
trying to achieve. Years after, I am seriously considering moving those
data back to PostgreSQL.
> The problem is: this is generating BIG table bloat, as you may imagine.
> Running a VACUUM FULL on an ex-22GB table on a standalone test server
> is almost immediate.
> If I had only one server, I'll process a table a time, with a nightly
> script, and issue a VACUUM FULL to tables that have already been
> processed.
>
> But I'm in a logical replication architecture (we are using a
> multimaster system called pgEdge, but I don't think it will make big
> difference, since it's based on logical replication), and I'm building
> a test cluster.
>
So you use PgEdge , but you wanna lose all the benefits of multi-master
, since your binary data won't be replicated ...
> I've been instructed to issue VACUUM FULL on both nodes, nightly, but
> before proceeding I read on docs that VACUUM FULL can disrupt logical
> replication, so I'm a bit concerned on how to proceed. Rows are
> cleared one a time (one transaction, one row, to keep errors to the
> record that issued them)
>
PgEdge is based on the old pg_logical, the old 2ndQuadrant extension,
not the native logical replication we have since pgsql 10. But I might
be mistaken.
> I read about extensions like pg_squeeze, but I wonder if they are
> still not dangerous for replication.
>
What's pgEdge take on that, I mean the bytea thing you are trying to
achieve here.
> Thanks for your help.
> Moreno.-
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ayush Vatsa | 2025-05-18 19:17:13 | Clarification on REVOKE ALL ON FUNCTION – Are there any privileges apart from EXECUTE? |
Previous Message | Moreno Andreo | 2025-05-16 15:45:59 | Logical replication, need to reclaim big disk space |