Re: Multixact members limit exceeded

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Hunčár <hunci(at)hunci(dot)sk>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Multixact members limit exceeded
Date: 2017-08-09 22:26:51
Message-ID: CAEepm=3AR1Ne0pjfRRjctSREv16bNc83c29ywQetDfHQ2rHedQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 9, 2017 at 10:06 PM, Peter Hunčár <hunci(at)hunci(dot)sk> wrote:
> SELECT relname, age(relminmxid) as mxid_age,

I'm pretty sure you can't use age(xid) to compute the age of a
multixact ID. Although they have type xid in pg_class and the
age(xid) function will happily subtract your multixact ID from the
next xid, the answer is meaningless. mxid_age is the function you
want, though I can't remember if it's present in 9.4.

On Thu, Aug 10, 2017 at 7:31 AM, Peter Hunčár <hunci(at)hunci(dot)sk> wrote:
> Thank you, yes those are the 'urgent' tables, I'd talk to the developers
> regarding the locks.I too think, there's something 'fishy' going on.
>
> Anyway, could it be that autovacuum blocks manual vacuum? Because I ran
> vacuum (full, verbose) and some tables finished quite fast, with huge amount
> of io recorded in the monitoring, but some of them are kind of stuck?
> Which brings me to the second question, how can I cancel autovacuum?
>
> And the last thing I don't understand:
>
> One particular table before vacuum full:
>
> relname | relminmxid | table_size
> ------------------------------+------------+------------
> delayed_jobs | 1554151198 | 21 GB
>
> And after vacuum full:
>
> relname | relminmxid | table_size
> --------------+------------+------------
> delayed_jobs | 1554155465 | 6899 MB
>
> Shouldn't be the relminmxid changed after vacuum full, or am I not
> understanding something?

How can delayed_jobs have relminmxid = 1,554,155,465? According to
your pg_controldata output, the active range of multixact IDs in your
cluster is 11,604 -> 129,346,573, so there is some confusion here.
Perhaps this is the output of an expression involving
age(multixactid)?

One thing I noticed is that there are ~4 billion members (that's how
many you have when you run out of member space), but only ~128 million
multixacts, so I think the average multixact has ~32 members.
Considering the way that multixacts grow by copying and extending by
one, by the time you've created a multixact with n members you've
eaten a total of n! member space with an average size of n/2 per
multixact... So one way to hit that average would be to repeatedly
build ~64 member multixacts, or if mixed with smaller ones then you'd
need to be intermittently building even larger ones. A thundering
herd of worker processes repeatedly share-locking the same row or
something like that?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Seong Son (US) 2017-08-09 22:31:23 Re: streaming replication - crash on standby
Previous Message Andres Freund 2017-08-09 22:26:38 Re: streaming replication - crash on standby