Re: Maintenance question / DB size anomaly...

From: Kurt Overberg <kurt(at)hotdogrecords(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Maintenance question / DB size anomaly...
Date: 2007-06-21 00:43:57
Message-ID: EC556F3E-E964-441D-A06D-278987F7EB5A@hotdogrecords.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Okay,

select * from _my_cluster.sl_log_1 where ctid = '(1,1)';
select * from _my_cluster.sl_log_1 where ctid = '(1,2)';
select * from _my_cluster.sl_log_1 where ctid = '(1,3)';
select * from _my_cluster.sl_log_1 where ctid = '(1,4)';

all returns zero rows. When I do a dump of that file, I get:

Block 1 ********************************************************
<Header> -----
Block Offset: 0x00002000 Offsets: Lower 408 (0x0198)
Block: Size 8192 Version 2 Upper 7680 (0x1e00)
LSN: logid 955 recoff 0x0daed68c Special 8192 (0x2000)
Items: 97 Free Space: 7272
Length (including item array): 412

<Data> ------
Item 1 -- Length: 121 Offset: 8068 (0x1f84) Flags: USED
XMIN: 1491480520 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
Block Id: 1 linp Index: 1 Attributes: 6 Size: 32
infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

...the fact that they weren't in the table, but in the file (I did
the filedump first,
then the query), then redid the filedump, the results are the same,
the rows are still
in the file. I have no idea how frequently these files are getting
written to, I assume
frequently. I also looked at the last block listed in the file,
6445, and also looked for
items 1-4, and also did not find them in the table using a similar
select as above. That seems
kinda strange, since there's right this second 11,000 items in that
table, but I'll roll with it for awhile.

Intrigued, I wanted to see what a filedump looked like of a row that
WAS in the table:

ctid | log_origin | log_xid | log_tableid | log_actionseq |
log_cmdtype |
(7,1) | 10 | 1491481037 | 8 | 473490934 |
I | (memberid,answerid,taskinstanceid) values
('144854','148707','0')

Block 7 ********************************************************
<Header> -----
Block Offset: 0x0000e000 Offsets: Lower 424 (0x01a8)
Block: Size 8192 Version 2 Upper 508 (0x01fc)
LSN: logid 955 recoff 0x0dc4bcc0 Special 8192 (0x2000)
Items: 101 Free Space: 84
Length (including item array): 428

<Data> ------
Item 1 -- Length: 129 Offset: 8060 (0x1f7c) Flags: USED
XMIN: 1491481037 CMIN: 7 XMAX: 0 CMAX|XVAC: 0
Block Id: 7 linp Index: 1 Attributes: 6 Size: 32
infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

...the NextID was (taken about 5 mins after the previous filedump):
Latest checkpoint's NextXID: 1491498183

I don't see any real differences in the file entry for a row that is
in the table, and one that I
don't see in the table. I hope I'm getting this right, its totally
fascinating seeing how
all this works.

About your other questions:

1. I have pg_clog segments all the way back to the start of the
database, all the way back
to March 14th, 2006 (most likely when the database was first brought
up on this machine).
The numeric names start at 0000 and go to 058E. I checked the recent
(within last 8 days)
and saw no errors containing the word 'checkpoint'. In fact, very
few errors at all.
The dang thing seems to be running pretty well, just a little slow.

mydb=# select datname, age(datfrozenxid) from pg_database;
datname | age
-----------+------------
template1 | 1491520270
template0 | 1491520270
postgres | 1491520270
mydb | 1076194904

Oooooooo..... thats not good, is it? Thanks for taking an interest,
Tom. I'm most likely going to
promote one of my subscribers to be master, then nuke this database,
but I have no problems keeping it
around if you think I may have found some obscure bug that could help
someone debug. Again, this
DB gets vacuumed every day, and in the beginning, I think I remember
doing a vacuum full every
day.

Thanks,

/kurt

On Jun 20, 2007, at 5:08 PM, Tom Lane wrote:

> Kurt Overberg <kurt(at)hotdogrecords(dot)com> writes:
>> Drat! I'm wrong again. I thought for sure there wouldn't be a
>> wraparound problem.
>
> Well, I'm not sure what it is now. You showed some invisible tuples
> with XMINs of
> XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
> XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0
> XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
> but the nextXID is
> 1490547335
> which is not that far ahead of those --- about 1.2 million
> transactions,
> or less than a day's time according to the clog timestamps, which
> suggest that you're burning several million XIDs a day. Perhaps
> you've
> wrapped past them since your earlier check --- if you try the same
> "select where ctid = " queries now, do they show rows?
>
> The other thing that's strange here is that an 8.0 installation should
> be pretty aggressive about recycling pg_clog segments, and yet you've
> got a bunch there. How far back do the files in pg_clog go --- what's
> the numeric range of the filenames, and the date range of their mod
> times? Have you checked the postmaster log to see if you're
> getting any
> complaints about checkpoint failures or anything like that? It would
> also be useful to look at the output of
> select datname, age(datfrozenxid) from pg_database;
>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2007-06-21 06:10:49 Re: PostgreSQL Configuration Tool for Dummies
Previous Message Karl Wright 2007-06-20 23:22:47 Re: Performance query about large tables, lots of concurrent access