Re: warm standby server stops doing checkpoints after awhile

From: Frank Wittig <fw(at)weisshuhn(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: warm standby server stops doing checkpoints after awhile
Date: 2007-06-01 11:33:37
Message-ID: 46600411.4030207@weisshuhn.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Simon Riggs schrieb:

> This is repeatable, yes?
Yes, it occures every time I begin with a new base backup. And it seem
to happen during recreation of tsearch2 vectors of large amounts of data
sets.

> Has anything crashed on your server?
No. Crashes didn't occur duriung that times.

> Are you using GIN or GIST indexes?
I'm using GIN indesex on tsearch2 vectors of very large ammount of data
sets. (About 3,8 million data sets of which about 30-50 thousands are
recreated and indexed when the descibed behavior occures.)

> I'll look at putting some debug information in there that logs whether
> multi-WAL actions remain unresolved for any length of time.
Extra debug info would be great.
I tested myself adding some debug output into the function Tom Lane
mentioned and found that after the server stopped checkpointing every
time the function is called it exits at this point:

/*
* Is it safe to checkpoint? We must ask each of the resource managers
* whether they have any partial state information that might prevent a
* correct restart from this point. If so, we skip this opportunity, but
* return at the next checkpoint record for another try.
*/
for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
{
if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
if (!(RmgrTable[rmid].rm_safe_restartpoint()))
return;
}

It exits every time with the same value for rmid.
Logs look like this (The quoted lines repeat):

<2007-06-01 13:10:28.936 CEST:%> DEBUG: 00000: executing restore
command "/var/lib/pgsql/restore.pl
/mnt/wal_archive/00000001000000C9000000C2 pg_xlog/RECOVERYXLOG"
<2007-06-01 13:10:28.936 CEST:%> LOCATION: RestoreArchivedFile, xlog.c:2474
<2007-06-01 13:11:29.055 CEST:%> LOG: 00000: restored log file
"00000001000000C9000000C2" from archive
<2007-06-01 13:11:29.055 CEST:%> LOCATION: RestoreArchivedFile, xlog.c:2504
<2007-06-01 13:11:29.364 CEST:%> DEBUG: 00000: found Checkpoint in XLOG
<2007-06-01 13:11:29.364 CEST:%> CONTEXT: xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.364 CEST:%> LOCATION: RecoveryRestartPoint,
xlog.c:5739
<2007-06-01 13:11:29.365 CEST:%> DEBUG: 00000: Ressource manager (13)
has partial state information
<2007-06-01 13:11:29.365 CEST:%> CONTEXT: xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.365 CEST:%> LOCATION: RecoveryRestartPoint,
xlog.c:5769

best regards,
Frank Wittig

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2007-06-01 11:37:42 Re: how to use array with "holes" ?
Previous Message Dudás József 2007-06-01 11:30:00 Re: invalid memory alloc after insert with c trigger function