Re: WAL recycling, Linux 2.4.18

From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, Glenn Stone <gstone(at)pogolinux(dot)com>
Subject: Re: WAL recycling, Linux 2.4.18
Date: 2002-07-08 18:04:51
Message-ID: 5.1.0.14.2.20020708134721.02f23578@pop.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Tom, all,

>Also, could you do the checkpoint manually and get a stack trace from
>that backend while others are hung up?

Yes, see below.

>directly try to acquire ControlFileLock. In any case it's hard to
>credit that the recycling process could take 90 seconds to rename a
>dozen or so files. If you have a gdb attached to a process doing a
>manual checkpoint, it would be fairly easy to see how long
>MoveOfflineLogs() runs. (Set a breakpoint at its start, when control
>reaches the breakpoint issue "fin" and see how long it takes to come
>back.)

Two things:

1) Remounted my ext3 filesystems as ext2 to rule out an ext3fs related
problem; the problems persist, so it's probably not an ext3 thing

2) Doing as you suggest, attaching to a /usr/lib/postgresql/bin/postgres
process from which I run a manual checkpoint results in the following behavior:

I was able to set a breakpoint at CreateCheckPoint - gdb never found a
MoveOfflineLogs for me to set a breakpoint.

Doing a single step from the breakpoint after the CreateCheckPoint takes
quite a few moments (perhaps 10-30 seconds, but I didn't use a stopwatch).
During this time control-C has no effect, and when it does take effect, it
leaves me at the displayed location below.

Single stepping from mdsync() to smgrsync() also seems to take a few
seconds (perhaps 3).

My GDB output is below. The checkpoint seems to take 42 seconds to complete:

pexicast_lg=# select now(); checkpoint; select now();
now
-------------------------------
2002-07-08 13:57:57.918766-04
(1 row)

CHECKPOINT
now
-------------------------------
2002-07-08 13:58:39.790787-04
(1 row)

I was wondering if it could be the sync function call, but I sit there on
the server and type sync until I'm blue in the face and they seem to run so
fast I don't even notice any delay.

Also, I can't seem to recompile PostgreSQL because Debian can't find a
tclConfig.sh.

Thanks,

Doug

(gdb) c
Continuing.

Breakpoint 1, 0x08087ae5 in CreateCheckPoint ()
(gdb) where
#0 0x08087ae5 in CreateCheckPoint ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) n
Single stepping until exit from function CreateCheckPoint,
which has no line number information.

Program received signal SIGINT, Interrupt.
0x40282967 in sync () from /lib/libc.so.6
(gdb) where
#0 0x40282967 in sync () from /lib/libc.so.6
#1 0x0810d167 in mdsync ()
#2 0x0810de5f in smgrsync ()
#3 0x081036d8 in FlushBufferPool ()
#4 0x08087d13 in CreateCheckPoint ()
#5 0x08111066 in ProcessUtility ()
#6 0x0810ecc5 in pg_exec_query_string ()
#7 0x0810fd5e in PostgresMain ()
#8 0x080f6d4e in ClosePostmasterPorts ()
#9 0x080f669f in ClosePostmasterPorts ()
#10 0x080f5882 in PostmasterMain ()
#11 0x080f5391 in PostmasterMain ()
#12 0x080d4e18 in main ()
#13 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) break MoveOfflineLogs
Function "MoveOfflineLogs" not defined.
(gdb) break MoveOfflineLog
Function "MoveOfflineLog" not defined.
(gdb) break MoveOfflineLogs()
Function "MoveOfflineLogs()" not defined.
(gdb) s
Single stepping until exit from function sync,
which has no line number information.
0x0810d167 in mdsync ()
(gdb) s
Single stepping until exit from function mdsync,
which has no line number information.
0x0810de5f in smgrsync ()
(gdb) s
Single stepping until exit from function smgrsync,
which has no line number information.
0x081036d8 in FlushBufferPool ()
(gdb) where
#0 0x081036d8 in FlushBufferPool ()
#1 0x08087d13 in CreateCheckPoint ()
#2 0x08111066 in ProcessUtility ()
#3 0x0810ecc5 in pg_exec_query_string ()
#4 0x0810fd5e in PostgresMain ()
#5 0x080f6d4e in ClosePostmasterPorts ()
#6 0x080f669f in ClosePostmasterPorts ()
#7 0x080f5882 in PostmasterMain ()
#8 0x080f5391 in PostmasterMain ()
#9 0x080d4e18 in main ()
#10 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function FlushBufferPool,
which has no line number information.
0x08087d13 in CreateCheckPoint ()
(gdb) where
#0 0x08087d13 in CreateCheckPoint ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function CreateCheckPoint,
which has no line number information.
0x08085ac4 in XLogFlush ()
(gdb) where
#0 0x08085ac4 in XLogFlush ()
#1 0x08111066 in ProcessUtility ()
#2 0x0810ecc5 in pg_exec_query_string ()
#3 0x0810fd5e in PostgresMain ()
#4 0x080f6d4e in ClosePostmasterPorts ()
#5 0x080f669f in ClosePostmasterPorts ()
#6 0x080f5882 in PostmasterMain ()
#7 0x080f5391 in PostmasterMain ()
#8 0x080d4e18 in main ()
#9 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function XLogFlush,
which has no line number information.

Program received signal SIGINT, Interrupt.
0x402887f2 in recv () from /lib/libc.so.6
(gdb) where
#0 0x402887f2 in recv () from /lib/libc.so.6
#1 0x080d42bc in StreamClose ()
#2 0x080d430d in pq_getbyte ()
#3 0x0810e7c8 in HandleFunctionRequest ()
#4 0x0810e837 in HandleFunctionRequest ()
#5 0x0810fc5e in PostgresMain ()
#6 0x080f6d4e in ClosePostmasterPorts ()
#7 0x080f669f in ClosePostmasterPorts ()
#8 0x080f5882 in PostmasterMain ()
#9 0x080f5391 in PostmasterMain ()
#10 0x080d4e18 in main ()
#11 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-07-08 18:08:06 Re: Redhat/glibc and postgre time "bug"
Previous Message Peter Schuller 2002-07-08 17:57:28 Frontend/backend protocol: authentication