From: | Doug Fields <dfields-pg-general(at)pexicom(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org, Glenn Stone <gstone(at)pogolinux(dot)com> |
Subject: | Re: WAL recycling, Linux 2.4.18 |
Date: | 2002-07-08 18:04:51 |
Message-ID: | 5.1.0.14.2.20020708134721.02f23578@pop.pexicom.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tom, all,
>Also, could you do the checkpoint manually and get a stack trace from
>that backend while others are hung up?
Yes, see below.
>directly try to acquire ControlFileLock. In any case it's hard to
>credit that the recycling process could take 90 seconds to rename a
>dozen or so files. If you have a gdb attached to a process doing a
>manual checkpoint, it would be fairly easy to see how long
>MoveOfflineLogs() runs. (Set a breakpoint at its start, when control
>reaches the breakpoint issue "fin" and see how long it takes to come
>back.)
Two things:
1) Remounted my ext3 filesystems as ext2 to rule out an ext3fs related
problem; the problems persist, so it's probably not an ext3 thing
2) Doing as you suggest, attaching to a /usr/lib/postgresql/bin/postgres
process from which I run a manual checkpoint results in the following behavior:
I was able to set a breakpoint at CreateCheckPoint - gdb never found a
MoveOfflineLogs for me to set a breakpoint.
Doing a single step from the breakpoint after the CreateCheckPoint takes
quite a few moments (perhaps 10-30 seconds, but I didn't use a stopwatch).
During this time control-C has no effect, and when it does take effect, it
leaves me at the displayed location below.
Single stepping from mdsync() to smgrsync() also seems to take a few
seconds (perhaps 3).
My GDB output is below. The checkpoint seems to take 42 seconds to complete:
pexicast_lg=# select now(); checkpoint; select now();
now
-------------------------------
2002-07-08 13:57:57.918766-04
(1 row)
CHECKPOINT
now
-------------------------------
2002-07-08 13:58:39.790787-04
(1 row)
I was wondering if it could be the sync function call, but I sit there on
the server and type sync until I'm blue in the face and they seem to run so
fast I don't even notice any delay.
Also, I can't seem to recompile PostgreSQL because Debian can't find a
tclConfig.sh.
Thanks,
Doug
(gdb) c
Continuing.
Breakpoint 1, 0x08087ae5 in CreateCheckPoint ()
(gdb) where
#0 0x08087ae5 in CreateCheckPoint ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) n
Single stepping until exit from function CreateCheckPoint,
which has no line number information.
Program received signal SIGINT, Interrupt.
0x40282967 in sync () from /lib/libc.so.6
(gdb) where
#0 0x40282967 in sync () from /lib/libc.so.6
#1 0x0810d167 in mdsync ()
#2 0x0810de5f in smgrsync ()
#3 0x081036d8 in FlushBufferPool ()
#4 0x08087d13 in CreateCheckPoint ()
#5 0x08111066 in ProcessUtility ()
#6 0x0810ecc5 in pg_exec_query_string ()
#7 0x0810fd5e in PostgresMain ()
#8 0x080f6d4e in ClosePostmasterPorts ()
#9 0x080f669f in ClosePostmasterPorts ()
#10 0x080f5882 in PostmasterMain ()
#11 0x080f5391 in PostmasterMain ()
#12 0x080d4e18 in main ()
#13 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) break MoveOfflineLogs
Function "MoveOfflineLogs" not defined.
(gdb) break MoveOfflineLog
Function "MoveOfflineLog" not defined.
(gdb) break MoveOfflineLogs()
Function "MoveOfflineLogs()" not defined.
(gdb) s
Single stepping until exit from function sync,
which has no line number information.
0x0810d167 in mdsync ()
(gdb) s
Single stepping until exit from function mdsync,
which has no line number information.
0x0810de5f in smgrsync ()
(gdb) s
Single stepping until exit from function smgrsync,
which has no line number information.
0x081036d8 in FlushBufferPool ()
(gdb) where
#0 0x081036d8 in FlushBufferPool ()
#1 0x08087d13 in CreateCheckPoint ()
#2 0x08111066 in ProcessUtility ()
#3 0x0810ecc5 in pg_exec_query_string ()
#4 0x0810fd5e in PostgresMain ()
#5 0x080f6d4e in ClosePostmasterPorts ()
#6 0x080f669f in ClosePostmasterPorts ()
#7 0x080f5882 in PostmasterMain ()
#8 0x080f5391 in PostmasterMain ()
#9 0x080d4e18 in main ()
#10 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function FlushBufferPool,
which has no line number information.
0x08087d13 in CreateCheckPoint ()
(gdb) where
#0 0x08087d13 in CreateCheckPoint ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function CreateCheckPoint,
which has no line number information.
0x08085ac4 in XLogFlush ()
(gdb) where
#0 0x08085ac4 in XLogFlush ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function XLogFlush,
which has no line number information.
Program received signal SIGINT, Interrupt.
0x402887f2 in recv () from /lib/libc.so.6
(gdb) where
#0 0x402887f2 in recv () from /lib/libc.so.6
#1 0x080d42bc in StreamClose ()
#2 0x080d430d in pq_getbyte ()
#3 0x0810e7c8 in HandleFunctionRequest ()
#4 0x0810e837 in HandleFunctionRequest ()
#5 0x0810fc5e in PostgresMain ()
#6 0x080f6d4e in ClosePostmasterPorts ()
#7 0x080f669f in ClosePostmasterPorts ()
#8 0x080f5882 in PostmasterMain ()
#9 0x080f5391 in PostmasterMain ()
#10 0x080d4e18 in main ()
#11 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2002-07-08 18:08:06 | Re: Redhat/glibc and postgre time "bug" |
Previous Message | Peter Schuller | 2002-07-08 17:57:28 | Frontend/backend protocol: authentication |