Re: Streaming replication and a disk full in primary

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and a disk full in primary
Date: 2010-04-07 10:02:04
Message-ID: 4BBC581C.5060204@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This task has been languishing for a long time, so I took a shot at it.
I took the approach I suggested before, keeping a variable in shared
memory to track the latest removed WAL segment. After walsender has read
a bunch of WAL records from a WAL file, it checks that what it read is
after the latest removed WAL segment, otherwise the data it read might
have came from a file that was already recycled and overwritten with new
data, and an error is thrown.

This changes the behavior so that if a standby server doing streaming
replication falls behind too much, the primary will remove/recycle a WAL
segment needed by the standby server. The previous behavior was that WAL
segments still needed by any connected standby server were never
removed, at the risk of filling the disk in the primary if a standby
server behaves badly.

In your version of this patch, the default was still the current
behavior where the primary retains WAL files that are still needed by
connected stadby servers indefinitely. I think that's a dangerous
default, so I changed it so that if you don't set standby_keep_segments,
the primary doesn't retain any extra segments; the number of WAL
segments available for standby servers is determined only by the
location of the previous checkpoint, and the status of WAL archiving.
That makes the code a bit simpler too, as we never care how far the
walsenders are. In fact, the GetOldestWALSenderPointer() function is now
dead code.

Fujii Masao wrote:
> Thanks for the review! And, sorry for the delay.
>
> On Thu, Jan 21, 2010 at 11:10 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I don't think we should do the check XLogWrite(). There's really no
>> reason to kill the standby connections before the next checkpoint, when
>> the old WAL files are recycled. XLogWrite() is in the critical path of
>> normal operations, too.
>
> OK. I'll remove that check from XLogWrite().
>
>> There's another important reason for that: If archiving is not working
>> for some reason, the standby can't obtain the old segments from the
>> archive either. If we refuse to stream such old segments, and they're
>> not getting archived, the standby has no way to catch up until archiving
>> is fixed. Allowing streaming of such old segments is free wrt. disk
>> space, because we're keeping the files around anyway.
>
> OK. We should terminate the walsender whose currently-opened WAL file
> has been already archived, isn't required for crash recovery AND is
> 'max-lag' older than the currently-written one. I'll change so.
>
>> Walreceiver will get an error if it tries to open a segment that's been
>> deleted or recycled already. The dangerous situation we need to avoid is
>> when walreceiver holds a file open while bgwriter recycles it.
>> Walreceiver will merrily continue streaming data from it, even though
>> it's be overwritten by new data already.
>
> s/walreceiver/walsender ?
>
> Yes, that's the problem that I'll have to fix.
>
>> A straightforward fix is to keep an "newest recycled XLogRecPtr" in
>> shared memory that RemoveOldXlogFiles() updates. Walreceiver checks it
>> right after read()ing from a file, before sending it to the client, and
>> throws an error if the data it read() was already recycled.
>
> I prefer this. But I don't think such an aggressive check of a "newest
> recycled XLogRecPtr" is required if the bgwriter always doesn't delete
> the WAL file which is newer than or equal to the walsenders' oldest WAL
> file. In other words, the WAL files which the walsender is reading (or
> will read) are not removed at the moment.
>
> Regards,
>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
standby_keep_segments-1.patch text/x-diff 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-04-07 10:23:24 Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()
Previous Message Simon Riggs 2010-04-07 09:41:36 Re: pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()