Re: Berserk Autovacuum (let's save next Mandrill)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2019-09-13 04:22:05
Message-ID: CAA4eK1JnHr0ppUEmK7gFF-57z2Z5q2mCUWZf4tVWVHGeStM9tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2019 at 8:52 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > Generally speaking, having more guc's for autovacuum and that too
> > which are in some way dependent on existing guc's sounds bit scary,
> > but OTOH whatever you wrote makes sense and can help the scenarios
> > which this thread is trying to deal with. Have you given any thought
> > to what Alvaro mentioned up-thread "certain tables would have some
> > sort of partial scan that sets the visibility map. There's no reason
> > to invoke the whole vacuuming machinery. I don't think this is
> > limited to append-only tables, but
> > rather those are just the ones that are affected the most."?
> >
>
> Speaking of partial scan I've considered before that we could use WAL
> to find which pages have garbage much and not all-visible pages. We
> can vacuum only a particular part of table that is most effective of
> garbage collection instead of whole tables. I've shared some results
> of that at PGCon and it's still in PoC state.
>
> Also, to address the issue of updating VM of mostly-append-only tables
> I considered some possible solutions:
>
> 1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot
> pruning, vacuuming table and updating VM. In addition to updating VM
> we need to do other two operations but since the mostly-insert-only
> tables would have less garbage the hot pruning and vacuuming table
> should be light workload. This is what I proposed on up-thread.
>

Yes, this is an option, but it might be better if we can somehow avoid
triggering the vacuum machinery.

> 2. This may have already been discussed before but we could update
> VM when hot pruning during SELECT operation. Since this affects SELECT
> performance it should be enabled on only particular tables by user
> request.
>

Yeah, doing anything additional in SELECT's can be tricky and think of
a case where actually there is nothing to prune on-page, in that case
also if we run the visibility checks and then mark the visibility map,
then it can be a noticeable overhead. OTOH, I think this will be a
one-time overhead because after the first scan the visibility map will
be updated and future scans don't need to update visibility map unless
someone has updated that page. I was wondering why not do this during
write workloads. For example, when Insert operation finds that there
is no space in the current page and it has to move to next page, it
can check if the page (that doesn't have space to accommodate current
tuple) can be marked all-visible. In this case, we would have already
done the costly part of an operation which is to Read/Lock the buffer.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-09-13 04:29:40 Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
Previous Message Dilip Kumar 2019-09-13 04:04:50 Re: pgbench - allow to create partitioned tables