Re: Can someone help explain what's going on from the attached logs?

From: Chris Redekop <chris(at)replicon(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Samuel Hwang <samuel(at)replicon(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Can someone help explain what's going on from the attached logs?
Date: 2011-10-26 17:41:38
Message-ID: CAC2SuR+gXF+78hszhD2biKW9W6JhW3d5vinFicp8wi5PReUbLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> Caveat #2 applies here
>
> http://developer.postgresql.org/pgdocs/postgres/hot-standby.html#HOT-STANDBY-CAVEATS
>
> The consistent state is delayed until your long running transactions
> end, which is workload dependent but transient.
>

I'm not quite sure how this correlates to what I'm seeing in 9.1.1. When
attempting to start a hotstandby while the primary is under load it seems to
get stuck in that 'starting up' mode even when there are no open
transactions. If I get it into this state, and then remove all the load
from the primary it still will not finish starting up. If I select from
pg_stat_activity on the primary it shows a couple connections, but they all
have null 'xact_start's and <IDLE> 'current_query's. Even if I then kill
all the connections to the primary (via pg_terminate_backend) the hotstandby
still will not finish starting up. I would assume there can't be any
transactions in progress if there are no connections to the primary.
Attempting to restart the hotstandby when in this state produces the same
result. If there is a transaction in progress for the duration of the
backup (or something like that) can it cause it to get into this state?

Here is a debug3 log of this startup from 9.1.1:

2011-10-26 11:25:32.931 MDT [22640]: [1-1] LOG: database system was shut
down in recovery at 2011-10-26 11:25:02 MDT
2011-10-26 11:25:32.931 MDT [22640]: [2-1] DEBUG: standby_mode = 'on'
2011-10-26 11:25:32.931 MDT [22640]: [3-1] DEBUG: primary_conninfo =
'host=othernode port=5432 user=postgres application_name=sync_rep_test'
2011-10-26 11:25:32.931 MDT [22640]: [4-1] DEBUG: trigger_file =
'/db/data/trigger_file'
2011-10-26 11:25:32.931 MDT [22640]: [5-1] LOG: entering standby mode
2011-10-26 11:25:32.931 MDT [22640]: [6-1] DEBUG: checkpoint record is at
7/96667628
2011-10-26 11:25:32.931 MDT [22640]: [7-1] DEBUG: redo record is at
7/960000A8; shutdown FALSE
2011-10-26 11:25:32.931 MDT [22640]: [8-1] DEBUG: next transaction ID:
0/11855637; next OID: 3219844
2011-10-26 11:25:32.931 MDT [22640]: [9-1] DEBUG: next MultiXactId: 1895;
next MultiXactOffset: 3810
2011-10-26 11:25:32.931 MDT [22640]: [10-1] DEBUG: oldest unfrozen
transaction ID: 1669, in database 1
2011-10-26 11:25:32.931 MDT [22640]: [11-1] DEBUG: transaction ID wrap
limit is 2147485316, limited by database with OID 1
2011-10-26 11:25:32.932 MDT [22640]: [12-1] DEBUG: resetting unlogged
relations: cleanup 1 init 0
2011-10-26 11:25:32.939 MDT [22640]: [13-1] DEBUG: initializing for hot
standby
2011-10-26 11:25:32.939 MDT [22640]: [14-1] LOG: redo starts at 7/960000A8
2011-10-26 11:25:33.027 MDT [22640]: [15-1] DEBUG: running transaction data
initialized
2011-10-26 11:25:33.027 MDT [22640]: [16-1] CONTEXT: xlog redo running
xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid
11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf
2011-10-26 11:25:33.028 MDT [22640]: [17-1] DEBUG: 0 KnownAssignedXids
(num=0 tail=0 head=0)
2011-10-26 11:25:33.028 MDT [22640]: [18-1] CONTEXT: xlog redo running
xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid
11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf
2011-10-26 11:25:33.028 MDT [22640]: [19-1] LOG: consistent state delayed
because recovery snapshot incomplete
2011-10-26 11:25:33.028 MDT [22640]: [20-1] CONTEXT: xlog redo running
xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid
11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf
2011-10-26 11:25:33.514 MDT [22640]: [21-1] LOG: consistent recovery state
reached at 7/98480E28
2011-10-26 11:25:33.515 MDT [22640]: [22-1] LOG: record with zero length at
7/98480E28
2011-10-26 11:25:33.515 MDT [22640]: [23-1] DEBUG: could not open file
"pg_xlog/000000010000000700000098" (log file 7, segment 152): No such file
or directory
2011-10-26 11:25:33.515 MDT [22642]: [1-1] DEBUG: find_in_dynamic_libpath:
trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver"
2011-10-26 11:25:33.515 MDT [22642]: [2-1] DEBUG: find_in_dynamic_libpath:
trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver.so"
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION 7/98000000
2011-10-26 11:25:33.519 MDT [22642]: [3-1] LOG: streaming replication
successfully connected to primary
2011-10-26 11:25:33.519 MDT [22642]: [4-1] DEBUG: sending write 0/0 flush
0/0 apply 7/98480E28
2011-10-26 11:25:33.522 MDT [22642]: [5-1] DEBUG: sending write 7/98020000
flush 0/0 apply 7/98480E28
2011-10-26 11:25:33.568 MDT [22642]: [6-1] DEBUG: sending write 7/98020000
flush 7/98020000 apply 7/98480E28
2011-10-26 11:25:33.569 MDT [22642]: [7-1] DEBUG: sending write 7/98040000
flush 7/98020000 apply 7/98480E28
...
2011-10-26 11:25:33.924 MDT [22642]: [49-1] DEBUG: sending write 7/98460000
flush 7/98400000 apply 7/98480E28
2011-10-26 11:25:33.943 MDT [22642]: [50-1] DEBUG: sending write 7/98460000
flush 7/98460000 apply 7/98480E28
DEBUG: standby "sync_rep_test" has now caught up with primary
2011-10-26 11:25:33.943 MDT [22642]: [51-1] DEBUG: sending write 7/98480E28
flush 7/98460000 apply 7/98480E28
2011-10-26 11:25:33.959 MDT [22642]: [52-1] DEBUG: sending write 7/98480E28
flush 7/98480E28 apply 7/98480E28
2011-10-26 11:25:34.192 MDT [22634]: [33-1] DEBUG: forked new backend,
pid=22644 socket=7
2011-10-26 11:25:34.192 MDT [22644]: [1-1] FATAL: the database system is
starting up
2011-10-26 11:25:34.192 MDT [22644]: [2-1] DEBUG: shmem_exit(1): 0
callbacks to make
2011-10-26 11:25:34.192 MDT [22644]: [3-1] DEBUG: proc_exit(1): 1 callbacks
to make
2011-10-26 11:25:34.192 MDT [22644]: [4-1] DEBUG: exit(1)
2011-10-26 11:25:34.192 MDT [22644]: [5-1] DEBUG: shmem_exit(-1): 0
callbacks to make
2011-10-26 11:25:34.192 MDT [22644]: [6-1] DEBUG: proc_exit(-1): 0
callbacks to make
2011-10-26 11:25:34.192 MDT [22634]: [34-1] DEBUG: server process (PID
22644) exited with exit code 1
2011-10-26 11:25:35.171 MDT [22634]: [35-1] DEBUG: forked new backend,
pid=22655 socket=7
2011-10-26 11:25:35.171 MDT [22655]: [1-1] FATAL: the database system is
starting up
2011-10-26 11:25:35.171 MDT [22655]: [2-1] DEBUG: shmem_exit(1): 0
callbacks to make
2011-10-26 11:25:35.171 MDT [22655]: [3-1] DEBUG: proc_exit(1): 1 callbacks
to make
2011-10-26 11:25:35.171 MDT [22655]: [4-1] DEBUG: exit(1)
2011-10-26 11:25:35.171 MDT [22655]: [5-1] DEBUG: shmem_exit(-1): 0
callbacks to make
2011-10-26 11:25:35.171 MDT [22655]: [6-1] DEBUG: proc_exit(-1): 0
callbacks to make
2011-10-26 11:25:35.172 MDT [22634]: [36-1] DEBUG: server process (PID
22655) exited with exit code 1
and so on...

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nicholson, Brad (Toronto, ON, CA) 2011-10-26 19:51:55 psql HTML mode - quoting HTML characters
Previous Message John R Pierce 2011-10-26 17:33:38 Re: All and ANY