Re: rebuild big tables with pgrepack

From: pg254kl(at)georgiou(dot)vip
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: rebuild big tables with pgrepack
Date: 2025-11-15 21:33:17
Message-ID: 176324240033.6.6265117135512277017.1008813004@georgiou.vip
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin


On 11/14/25 2:14 PM, ek ek - livadidrive at gmail.com wrote:
> Hello everyone,
> I’m going to rebuild a 900GB table using pg_repack. I’m hesitant to do
> such a large operation in one go.
> Is there an ideal or recommended way to repack very large tables?

I recall almost 20 years ago when Bruce Momjian was educating us on
PostgreSQL (using Sybase ASE, but migrating to PostgreSQL), we discussed
the table and index rebuilding mania.  I never forgot the "we have
clients that haven't rebuilt anything for years and they run just fine"
haha.  OK, sometimes if you have very "hot" tables its warranted, but
the point that stayed with me is that it's usually unnecessary.

But ok, you may have your reasons, so...

1. Connect to your cluster from a box with good connectivity to it (eg:
for AWS RDS this means an EC2 instance on same VPC)
2. Definitely run pg_repack inside a tmux session
3. Be safe and have at least 3 x pg_total_relation_size(table) free space
4. Make sure you understand the -k (--no-kill-backend) and
--wait-timeout options.  By default (no -k) pg_repack will wait on
blocking backends and on a busy table eventually timeout. Decisions
decisions.  Definitely run when things are calm.
5. If your default toast compression is not lz4, and this table uses
TOAST, consider changing it to lz4 prior to this pg_repack.  lz4 is
blazingly fast with low cpu cost.
6. I don't like running anything heavy handed on production without
practicing/testing first.  If things go bad and management asks "did
you test this?" the answer should always be "yes".  If this is your
first time pg_repacking such table, make sure you first do it on a
staging environment under heavy load during a performance run.  The
cache churn can be an issue if your workload depends on a hot
cache.  Again, definitely run during the calmest window, but test
under load ;)

At 900GB I'd start considering partitioning, it will make maintenance
jobs more efficient, not to mention the smaller indices should help with
better use of your cache (if your partition strategy can segregate old
unused data from newer used data).

--
regards,
Kiriakos Georgiou

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mauricio Fernandez 2025-11-18 13:33:57 Hide psql passwords
Previous Message Ron Johnson 2025-11-14 19:47:27 Re: rebuild big tables with pgrepack