Re: shared-memory based stats collector

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2018-07-05 03:04:23
Message-ID: 20180705.120423.49626073.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 04 Jul 2018 17:23:51 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <67470(dot)1530739431(at)sss(dot)pgh(dot)pa(dot)us>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > At Mon, 2 Jul 2018 14:25:58 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoYQhr30eAcgJCi1v0FhA+3RP1FZVnXqSTLe=6fHy9e5oA(at)mail(dot)gmail(dot)com>
> >> Copying the whole hash table kinds of sucks, partly because of the
> >> time it will take to copy it, but also because it means that memory
> >> usage is still O(nbackends * ntables). Without looking at the patch,
> >> I'm guessing that you're doing that because we need a way to show each
> >> transaction a consistent snapshot of the data, and I admit that I
> >> don't see another obvious way to tackle that problem. Still, it would
> >> be nice if we had a better idea.
>
> > The consistency here means "repeatable read" of an object's stats
> > entry, not a snapshot covering all objects. We don't need to copy
> > all the entries at once following this definition. The attached
> > version makes a cache entry only for requested objects.
>
> Uh, what? That's basically destroying the long-standing semantics of
> statistics snapshots. I do not think we can consider that acceptable.
> As an example, it would mean that scan counts for indexes would not
> match up with scan counts for their tables.

The current stats collector mechanism sends at most 8 table stats
in a single message. Split messages from multiple transactions
can reach to collector in shuffled order. The resulting snapshot
can be "inconsistent" if INQUIRY message comes between such split
messages. Of course a single meesage would be enough for common
transactions but not for all.

Even though the inconsistency happens more frequently with this
patch, I don't think users expect such strict consistency of
table stats, especially on a busy system. And I believe it's a
good thing if users saw more "useful" information for the relaxed
consistency. (The actual meaning of "useful" is out of the
current focus:p)

Meanwhile, if we should keep the practical-consistency, a giant
lock is out of the question. So we need a transactional stats of
any shape. It can be a whole-image snapshot or a regular MMVC
table, or maybe the current dshash with UNDO logs. Since there
are actually many states, it is inevitable to require storage to
reproduce each state.

I think the consensus is that the whole-image snapshot takes
too-much memory. MMVC is apparently too-much for the purpose.

UNDO logs seems a bit promising. If we looking stats in a long
transaction, the required memory for UNDO information easily
reaches to the same amount with the whole-image snapshot. But I
expect that it is not so common.

I'll consider that apart from the current patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-07-05 03:08:44 Re: Old small commitfest items
Previous Message Michael Paquier 2018-07-05 02:53:52 Re: Old small commitfest items