From: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_multixact not getting truncated |
Date: | 2014-11-05 18:40:41 |
Message-ID: | 545A6F29.7080704@BlueTreble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/3/14, 7:40 PM, Josh Berkus wrote:
> On 11/03/2014 05:24 PM, Josh Berkus wrote:
>> BTW, the reason I started poking into this was a report from a user that
>> they have a pg_multixact directory which is 21GB in size, and is 2X the
>> size of the database.
>>
>> Here's XID data:
>>
>> Latest checkpoint's NextXID: 0/1126461940
>> Latest checkpoint's NextOID: 371135838
>> Latest checkpoint's NextMultiXactId: 162092874
>> Latest checkpoint's NextMultiOffset: 778360290
>> Latest checkpoint's oldestXID: 945761490
>> Latest checkpoint's oldestXID's DB: 370038709
>> Latest checkpoint's oldestActiveXID: 1126461940
>> Latest checkpoint's oldestMultiXid: 123452201
>> Latest checkpoint's oldestMulti's DB: 370038709
>>
>> Oldest mxid file is 29B2, newest is 3A13
>>
>> No tables had a relminmxid of 1 (some of the system tables were 0,
>> though), and the data from pg_class and pg_database is consistent.
>
> More tidbits:
>
> I just did a quick check on customer systems (5 of them). This only
> seems to be happening on customer systems where I specifically know
> there is a high number of FK lock waits (the system above gets around
> 1000 per minute that we know of). Other systems with higher transaction
> rates don't exhibit this issue; I checked a 9.3.5 database which
> normally needs to do XID wraparound once every 10 days, and it's
> pg_multixact is only 48K (it has one file, 0000).
>
> Note that pg_clog on the bad machine is only 64K in size.
>
> How many IDs are there per mxid file?
#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
So for 8k blocks, there are 2k offsets (really MultiXactIds) per page, 32 pages per SLRU segment. Your file names aren't making sense to me. :( If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is 974 323 712. You're only looking in pg_multixact/members/, yes?
Relevant code starts in vacuum.c/vac_update_datfrozenxid()
If there's any rows in pg_class for tables/matviews/toast with either relfrozenxid > next XID or relminmxid > next MXID then the code *silently* pulls the plug right there. IMO we should at least issue a warning.
That you see relminxid advancing tells me this isn't the case here.
ForceTransactionIdLimitUpdate() is a bit suspect in that it only looks at xidVacLimit, but if it were breaking then you wouldn't see pg_database minmxid advancing.
Looking through TruncateMultiXact, I don't see anything that could prevent truncation, unless the way we're handing MultiXactID wraparound is broken (which I don't see any indication of).
Can you post the contents of pg_multixact/members/?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2014-11-05 19:15:10 | Re: pg_multixact not getting truncated |
Previous Message | Marko Tiikkaja | 2014-11-05 18:39:47 | Re: to_char_at_timezone()? |