Re: Berserk Autovacuum (let's save next Mandrill)

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-20 02:23:51
Message-ID: 20200320022351.wgfrfdmo7jlerbxz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-20 15:05:03 +1300, David Rowley wrote:
> On Fri, 20 Mar 2020 at 11:17, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
>
> I understand your concern and you might be right. However, I think the
> main reason that the default settings for the new threshold and scale
> factor has deviated this far from the existing settings is regarding
> the example of a large insert-only table that receives inserts of 1
> row per xact. If we were to copy the existing settings then when that
> table gets to 1 billion rows, it would be eligible for an
> insert-vacuum after 200 million tuples/xacts, which does not help the
> situation since an anti-wraparound vacuum would be triggering then
> anyway.

Sure, that'd happen for inserts that happen after that threshold. I'm
just not convinced that this is as huge a problem as presented in this
thread. And I'm fairly convinced the proposed solution is the wrong
direction to go into.

It's not like that's not an issue for updates? If you update one row per
transaction, then you run into exactly the same issue for a table of the
same size? You maybe could argue that it's more common to insert 1
billion tuples in individual transaction, than it is to update 1 billion
tuples in individual transactions, but I don't think it's a huge
difference if it even exist.

In fact the problem is worse for the update case, because that tends to
generate a lot more random looking IO during vacuum (both because only
parts of the table are updated causing small block reads/writes, and
because it will need [multiple] index scans/vacuum, and because the
vacuum is a lot more expensive CPU time wise).

Imo this line of reasoning is about adding autovacuum scheduling based
on xid age, not about insert only workloads.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-20 02:32:30 Re: Missing errcode() in ereport
Previous Message Bruce Momjian 2020-03-20 02:15:57 Re: color by default