Re: RDS restore failed due to WAL log and disk space-- any tidy fixes?

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: RDS restore failed due to WAL log and disk space-- any tidy fixes?
Date: 2024-11-17 17:34:42
Message-ID: CAOC+FBWW4SYT4K0Hk8rL+xMwK7mGi6O8CqBeMjW66GdxCBruPQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Would setting max_slot_wal_keep_size to something like 1GB ensure that WAL
logs don't cause runaway disk use during restore? It's currently -1...

On Sun, Nov 17, 2024 at 9:31 AM Wells Oliver <wells(dot)oliver(at)gmail(dot)com> wrote:

> Actually, in RDS it seems you cannot set archive_mode either.
>
> On Sun, Nov 17, 2024 at 9:23 AM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
> wrote:
>
>> It does. I think it uses WAL behind the scenes. In RDS unfortunately
>> cannot set wal_level, but you can set archive_mode.
>>
>> On Sun, Nov 17, 2024 at 9:21 AM Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
>> wrote:
>>
>>> Doesn't RDS have its own replication?
>>>
>>> Anyway, for pg_restore, I'd absolutely set archive_mode=off
>>> and wal_level=minimal, then set them to their production values when it's
>>> finished.
>>>
>>> On Sun, Nov 17, 2024 at 12:12 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Interesting. I am migrating a pg_dump archive to a new server, in a
>>>> single go. Does it make sense to disable (or speed up?) WAL archiving
>>>> during the restore, then reenable it after the restore so a future replica
>>>> could work? What would be the steps here? Would disabling or "speeding up"
>>>> be faster?
>>>>
>>>> max_slot_wal_keep_size is -1 at the moment so I think that's why it
>>>> kept a ton of WAL and ran out of space.
>>>>
>>>> On Sun, Nov 17, 2024 at 7:41 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
>>>> wrote:
>>>>
>>>>> On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrote:
>>>>> > I provisioned an RDS instance with 2500GB space and began the
>>>>> restore of a database I know to be about 1750 GB using 16 jobs.
>>>>> >
>>>>> > Unfortunately, it died very near the end when it ran out of disk
>>>>> space due to WAL log usage. Lots of:
>>>>> >
>>>>> > 2024-11-17 00:07:09 UTC::@:[19861]:PANIC: could not write to file
>>>>> "pg_wal/xlogtemp.19861": No space left on device
>>>>> >
>>>>> >
>>>>> > And then kaboom.
>>>>> >
>>>>> > I'm wondering what my course of action should be. Can I
>>>>> disable/reduce WAL during a restore?
>>>>> > wal_level is set to replica, can this temporarily be set to minimal?
>>>>> Should I just eat the extra
>>>>> > costs to add headroom for the WAL? Would using fewer jobs during a
>>>>> restore reduce the amount of WAL
>>>>> > created?
>>>>>
>>>>> If you are using minimal WAL logging and you restore the dump in a
>>>>> single transaction, you
>>>>> should see way less WAL generated, because data inserted into the
>>>>> table in the same transaction
>>>>> as the CREATE TABLE statement need not be WAL logged.
>>>>>
>>>>> But you might more easily solve the problem by speeding up or
>>>>> disabling the WAL archiver,
>>>>> so that PostgreSQL removes old WAL after the next checkpoint.
>>>>>
>>>>> Yours,
>>>>> Laurenz Albe
>>>>>
>>>>
>>>>
>>>> --
>>>> Wells Oliver
>>>> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>>>>
>>>
>>>
>>> --
>>> Death to <Redacted>, and butter sauce.
>>> Don't boil me, I'm still alive.
>>> <Redacted> lobster!
>>>
>>
>>
>> --
>> Wells Oliver
>> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>>
>
>
> --
> Wells Oliver
> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2024-11-17 18:18:58 Re: RDS restore failed due to WAL log and disk space-- any tidy fixes?
Previous Message Wells Oliver 2024-11-17 17:31:19 Re: RDS restore failed due to WAL log and disk space-- any tidy fixes?