Re: Streaming replication and a disk full in primary

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and a disk full in primary
Date: 2010-04-12 12:39:37
Message-ID: g2m3f0b79eb1004120539s59fd98a7q2c3176ae797f2fe1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 12, 2010 at 7:41 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> We should remove the document "25.2.5.2. Monitoring"?
>
> I updated it to no longer claim that the primary can run out of disk
> space because of a hung WAL sender. The information about calculating
> the lag between primary and standby still seems valuable, so I didn't
> remove the whole section.

Yes.

> ! An important health indicator of streaming replication is the amount
> ! of WAL records generated in the primary, but not yet applied in the
> ! standby.

Since pg_last_xlog_receive_location doesn't let us know the WAL location
not yet applied, we should use pg_last_xlog_replay_location instead. How
How about?:

----------------
An important health indicator of streaming replication is the amount
of WAL records generated in the primary, but not yet applied in the
standby. You can calculate this lag by comparing the current WAL write
- location on the primary with the last WAL location received by the
+ location on the primary with the last WAL location replayed by the
standby. They can be retrieved using
<function>pg_current_xlog_location</> on the primary and the
- <function>pg_last_xlog_receive_location</> on the standby,
+ <function>pg_last_xlog_replay_location</> on the standby,
respectively (see <xref linkend="functions-admin-backup-table"> and
<xref linkend="functions-recovery-info-table"> for details).
- The last WAL receive location in the standby is also displayed in the
- process status of the WAL receiver process, displayed using the
- <command>ps</> command (see <xref linkend="monitoring-ps"> for details).
</para>
</sect3>
----------------

>> Why is standby_keep_segments used even if max_wal_senders is zero?
>> In that case, ISTM we don't need to keep any WAL files in pg_xlog
>> for the standby.
>
> True. I don't think we should second guess the admin on that, though.
> Perhaps he only set max_wal_senders=0 temporarily, and will be
> disappointed if the the logs are no longer there when he sets it back to
> non-zero and restarts the server.

OK. Since the behavior is not intuitive for me, I'd like to add the note
into the end of the description about "standby_keep_segments". How about?:

----------------
This setting has effect if max_wal_senders is zero.
----------------

>> When walreceiver has gotten stuck for some reason, walsender would be
>> unable to pass through the send() system call, and also get stuck.
>> In the patch, such a walsender cannot exit forever because it cannot
>> call XLogRead(). So I think that the bgwriter needs to send the
>> exit-signal to such a too lagged walsender. Thought?
>
> Any backend can get stuck like that.

OK.

> + },
> +
> + {
> + {"standby_keep_segments", PGC_SIGHUP, WAL_CHECKPOINTS,
> + gettext_noop("Sets the number of WAL files held for standby servers"),
> + NULL
> + },
> + &StandbySegments,
> + 0, 0, INT_MAX, NULL, NULL

We should s/WAL_CHECKPOINTS/WAL_REPLICATION ?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2010-04-12 12:58:15 Re: testing HS/SR - 1 vs 2 performance
Previous Message Jim Mlodgenski 2010-04-12 12:32:39 Re: testing HS/SR - 1 vs 2 performance