Re: shared-memory based stats collector

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: alvherre(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, ah(at)cybertec(dot)at, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2018-11-27 08:59:49
Message-ID: 20181127.175949.06807946.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you very much for the testing.

At Mon, 26 Nov 2018 02:52:30 +0100, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote in <6c079a69-feba-e47c-7b85-8a9ff31adef3(at)2ndquadrant(dot)com>
> Hi,
>
> Unfortunately, the patch does not apply anymore - it seems it got broken
> by the changes to signal handling and/or removal of magic OIDs :-(

Big hit but was simple.

> I've done a review and testing when applied on top of 10074651e335.
>
> Firstly, the testing - I was wondering if the patch has some performance
> impact, so I've done some testing with a read-only workload on large
> number of tables (1k, 10k and 100k) while concurrently selecting data
> from pg_stat_* catalogs at the same time.
>
> In one case both workloads were running against the same database, in
> another there were two separate databases (and the selects from stat
> catalogs were running against an "empty" database with no use tables).
>
> In both cases there were 8 clients doing selects from the user tables,
> and 4 clients accessing the pg_stat_* catalogs.
>
> For the "single database" case the results looks like this (this is just
> patched / master throughput):
>
> # of tables xact stats
> ------------------------------
> 1000 97.71% 98.76%
> 10000 100.38% 97.97%
> 100000 100.10% 98.50%
>
> xact is throughput of the user workload (select from the large number of
> tables) and stats is throughput for selects from system catalogs.
>
> So pretty much no difference - 2% is within noise on this machine.

No increase by the number of tables seems suggesting that.

> On two separate databases the results are a bit more interesting:
>
> # of tables xact stats
> -------------------------------
> 1000 100.49% 80.38%
> 10000 103.18% 80.28%
> 100000 100.85% 81.95%
>
> For the main workload there's pretty much no difference, but for selects
> from the stats catalogs there's ~20% drop in throughput. In absolute
> numbers this means drop from ~670tps to ~550tps. I haven't investigated
> this, but I suppose this is due to dshash seqscan being more expensive
> than reading the data from file.

Thanks for finding that. The three seqscan loops in
pgstat_vacuum_stat cannot take such a long time, I think. I'll
investigate it.

> I don't think any of this is an issue in practice, though. The important
> thing is that there's no measurable impact on the regular workload.
>
> Now, a couple of comments regarding individual parts of the patch.
>
>
> 0001-0003
> ---------
>
> I do think 0001 - 0003 are ready, with some minor cosmetic issues:
>
> 1) I'd rephrase the last part of dshash_seq_init comment more like this:
>
> * If consistent is set for dshash_seq_init, the all hash table
> * partitions are locked in the requested mode (as determined by the
> * exclusive flag), and the locks are held until the end of the scan.
> * Otherwise the partition locks are acquired and released as needed
> * during the scan (up to two partitions may be locked at the same time).

Replaced with this.

> Maybe it should briefly explain what the consistency guarantees are (and
> aren't), but considering we're not materially changing the existing
> behavior probably is not really necessary.

Mmm. actually sequential scan is a new thing altogether, but..

> 2) I think the dshash_find_extended() signature should be more like
> dshash_find(), i.e. just appending parameters instead of moving them
> around unnecessarily. Perhaps we should add

Sure. It seems to have done by my off-lined finger;p Fixed.

> Assert(nowait || !lock_acquired);
>
> Using nowait=false with lock_acquired!=NULL does not seem sensible.

Agreed. Added.

> 3) I suppose this comment in postmaster.c is just copy-paste:
>
> -#define BACKEND_TYPE_ARCHIVER 0x0010 /* bgworker process */
> +#define BACKEND_TYPE_ARCHIVER 0x0010 /* archiver process */

Ugh! Fixed.

> I wonder why wasn't archiver a regular auxiliary process already? It
> seems like a fairly natural thing, so why not?

Perhaps it's just because it didn't need access to shared memory.

> 0004 (+0005 and 0007)
> ---------------------
>
> This seems fine, but I have my doubts about two changes - removing of
> stats_temp_directory and the IDLE_STATS_UPDATE_TIMEOUT thingy.
>
> There's a couple of issues with the stats_temp_directory. Firstly, I
> don't understand why it's spread over multiple parts of the patch. The
> GUC is removed in 0004, the underlying variable is removed in 0005 and
> then the docs are updated in 0007. If we really want to do this, it
> should happen in a single patch.

Sure.

> But the main question is - do we really want to do that? I understand
> this directory was meant for the stats data we're moving to shared
> memory, so removing it seems natural. But clearly it's used by
> pg_stat_statements - 0005 fixes that, of course, but I wonder if there
> are other extensions using it to store files?
> It's not just about how intensive I/O to those files is, but this also
> means the files will now be included in backups / pg_rewind, and maybe
> that's not really desirable?
>
> Maybe it's fine but I'm not quite convinced about it ...

It was also in my mind. Anyway sorry for the strange separation.

I was confused about pgstat_stat_directory (the names are
actually very confusing..). Addition to that pg_stat_statements
does *not* use the variable stats_temp_directory, but using
PG_STAT_TMP_DIR. pgstat_stat_directory was used only by
basebackup.c.

The GUC base variable pgstat_temp_directory is not extern'ed so
we can just remove it along with the GUC
definition. pgstat_stat_directory (it actually stores *temporary*
stats directory) was extern'ed in pgstat.h and PG_STAT_TMP_DIR is
defined in pgstat.h. They are not removed in the new version.
Finally 0005 no longer breaks any other bins, contribs and
externalextensions.

> I'm not sure I understand what IDLE_STATS_UPDATE_TIMEOUT does. You've
> described it as
>
> This adds a new timeout IDLE_STATS_UPDATE_TIMEOUT. This works
> similarly to IDLE_IN_TRANSACTIION_SESSION_TIMEOUT. It fires in
> at most PGSTAT_STAT_MIN_INTERVAL(500)ms to clean up pending
> statistics update.
>
> but I'm not sure what pending updates do you mean? Aren't we updating
> the stats at the end of each transaction? At least that's what we've
> been doing before, so maybe this patch changes that?

Without the timeout, updates on shared memory happens at the same
rate with transaction traffic and it easily causes congestion. So
the update frequency is limited to the timtout in this patch and
the local statistics made by trasactions committed within the
timeout interval will be merged into one shared stats update. It
is the "pending statistics".

With the socket-based stats collector, it doesn't update the
temporary stats file with the interval not shorter than the
timeout. The update timeout seemingly behaves the same way with
the socket-based stats collector in the view of readers.

If local statistics is not fully processed at the end of the last
transaction. We don't have a chance to flush them before the next
transaction ends. So timeout is loaded if any "panding stats"
remains. (around postgres.c:4175) The pending stats are processed
forcibly in ProcessInterrupts().

postgres.c:4175
> stats_timeout = pgstat_update_stat(false);
> if (stats_timeout > 0)
> {
> disable_idle_stats_update_timeout = true;
> enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT,
> stats_timeout);

The attached are the new version addressing the all comments here
(except the 20% regression), and rebased.

v10-0001-sequential-scan-for-dshash.patch
v10-0002-Add-conditional-lock-feature-to-dshash.patch
fixed.
v10-0003-Make-archiver-process-an-auxiliary-process.patch
fixed.
v10-0004-Shared-memory-based-stats-collector.patch
updated not to touch guc.
v10-0005-Remove-the-GUC-stats_temp_directory.patch
collected all guc-related changes.
updated not to break other programs.
v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch
basebackup.c requires both bestats.h and pgstat.h
v10-0007-Documentation-update.patch
small change related to 0005.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v10-0001-sequential-scan-for-dshash.patch text/x-patch 10.6 KB
v10-0002-Add-conditional-lock-feature-to-dshash.patch text/x-patch 5.0 KB
v10-0003-Make-archiver-process-an-auxiliary-process.patch text/x-patch 12.0 KB
v10-0004-Shared-memory-based-stats-collector.patch text/x-patch 205.7 KB
v10-0005-Remove-the-GUC-stats_temp_directory.patch text/x-patch 6.5 KB
v10-0006-Split-out-backend-status-monitor-part-from-pgstat.patch text/x-patch 174.5 KB
v10-0007-Documentation-update.patch text/x-patch 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-11-27 09:02:52 Re: pgsql: Integrate recovery.conf into postgresql.conf
Previous Message Peter Eisentraut 2018-11-27 08:59:26 Re: Continue work on changes to recovery.conf API