Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: postgresql(at)taljaren(dot)se
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Date: 2022-12-15 18:56:30
Message-ID: 3439037.1671130590@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> We run a specific data model, where each customer has its own schema with
> its own set of tables. Each database server hosts 16 databases, each
> containing around 250 customer schemas. Due to postgres creating a new file
> for each database object, we end up with around 5 million files on each
> database server. This may or may not be related to the issue we're seeing
> (new algorithms with new time complexity?)

> We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
> slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
> each night, taking around 2 hours to complete. After the upgrade, we see a
> constant 100% CPU utilization during the vacuumdb process (almost no I/O
> activity), and vacuumdb cannot complete within a reasonable time. We're able
> to vacuum about 3-4 databases each night.

I poked into this a little bit. On HEAD, watching things with "perf"
identifies vac_update_datfrozenxid() as the main time sink. It's not
hard to see why: that does a seqscan of pg_class, and it's invoked
at the end of each vacuum() call. So if you try to vacuum each table
in the DB separately, you're going to end up spending O(N^2) time
in often-useless rescans of pg_class. This isn't a huge problem in
ordinary-sized DBs, but with 125000 small tables in the DB it becomes
the dominant cost.

> Concerning speed:
> * Version 9.5, 10, 11 are fast (9.5 slower than 10 and 11)
> * Version 12, 13, 14 are very, very slow
> * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
> as 10 or 11.

The reason for the v12 performance change is that up through v11,
"vacuumdb -a -z" would just issue "VACUUM (ANALYZE);" in each DB.
So vac_update_datfrozenxid only ran once. Beginning in v12 (commit
e0c2933a7), vacuumdb issues a separate VACUUM command for each
targeted table, which causes the problem.

I'm not sure why there's a performance delta from 14 to 15.
It doesn't look like vacuumdb itself had any material changes,
so we must have done something different on the backend side.
This may indicate that there's another O(N^2) behavior that
we got rid of in v15. Anyway, that change isn't bad, so I did
not poke into it too much.

Conclusions:

* As a short-term fix, you could try using vacuumdb from v11
with the newer servers. Or just do "psql -c 'vacuum analyze'"
and not bother with vacuumdb at all. (On HEAD, with this
example database, 'vacuum analyze' takes about 7 seconds per DB
for me, versus ~10 minutes using vacuumdb.)

* To fix vacuumdb properly, it might be enough to get it to
batch VACUUMs, say by naming up to 1000 tables per command
instead of just one. I'm not sure how that would interact
with its parallelization logic, though. It's not really
solving the O(N^2) issue either, just pushing it further out.

* A better idea, though sadly not very back-patchable, could
be to expose a VACUUM option to control whether it runs
vac_update_datfrozenxid, so that vacuumdb can do that just
once at the end. Considering that vac_update_datfrozenxid
requires an exclusive lock, the current behavior is poison for
parallel vacuuming quite aside from the O(N^2) issue. This
might tie into some work Peter G. has been pursuing, too.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2022-12-15 20:06:57 Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Previous Message Alvaro Herrera 2022-12-15 17:20:08 Re: Crash during backend start when low on memory

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-12-15 19:11:14 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Peter Geoghegan 2022-12-15 18:53:04 Re: New strategies for freezing, advancing relfrozenxid early