Re: [HACKERS] More stats about skipped vacuums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] More stats about skipped vacuums
Date: 2017-11-27 21:05:06
Message-ID: CABUevEyW5hhf_cacw3uRY+a4+2J2tk2mTvfdcCikFj6mf5W-cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 27, 2017 at 7:53 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Nov 26, 2017 at 3:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Sat, Nov 25, 2017 at 12:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Mumble. It's a property I'm pretty hesitant to give up, especially
> >>> since the stats views have worked like that since day one. It's
> >>> inevitable that weakening that guarantee would break peoples' queries,
> >>> probably subtly.
> >
> >> You mean, queries against the stats views, or queries in general? If
> >> the latter, by what mechanism would the breakage happen?
> >
> > Queries against the stats views, of course.
>
> Hmm. Those are probably rare. If we only took a snapshot of the
> statistics for the backends that explicitly access those views, that
> probably wouldn't be too crazy.
>
> Sorry if this is a stupid question, but how often and for what purpose
> to regular backends need the stats collector data for purposes other
> than querying the stats views? I thought that the data was only used
> to decide whether to VACUUM/ANALYZE, and therefore would be accessed
> mostly by autovacuum, and for that you'd actually want the most
> up-to-date view of the stats for a particular table that is available,
> not any older snapshot.
>
>
Autovacuum resets the stats to make sure. Autovacuum in particular can
probably be made a lot more efficient, because it only ever looks at one
relation at a time, I think.

What I've been thinking about for that one before is if we could just
invent a protocol (shmq based maybe) whereby autovacuum can ask the stats
collector for a single table or index stat. If autovacuum never needs to
see a consistent view between multiple tables, I would think that's going
to be a win in a lot of cases.

I don't think regular backends use them at all. But anybody looking at the
stats do, and it is pretty important there.

However, when it comes to the stats system, I'd say that on any busy system
(which would be the ones to care about), the stats structures are still
going to be *written* a lot more than they are read. We certainly don't
read them at the rate of once per transaction. A lot of the reads are also
limited to one database of course.

I wonder if we want to implement some sort of copy-on-read-snapshot in the
stats collector itself. So instead of unconditionally publishing
everything, have the backends ask for it. When a backend asks for it it
gets a "snapshot counter" or something from the stats collector, and on the
next write after that we do a copy-write if the snapshot it still
available. (no, i have not thought in detail)

Or -- if we keep a per-database hashtable in dynamic shared memory (which
we can now). Can we copy it into local memory in the backend fast enough
that we can hold a lock and just queue up the stats updates during the
copy? If we can copy the complete structure, that would fix one of the
bigger bottlenecks with it today which is that we dump and rebuild the
hashtables as we go through the tempfiles.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Glapa 2017-11-27 21:05:39 Re: ERROR: too many dynamic shared memory segments
Previous Message Robert Haas 2017-11-27 21:01:07 Re: default range partition and constraint exclusion