Re: shared-memory based stats collector

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: andres(at)anarazel(dot)de, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2018-09-20 07:55:27
Message-ID: 11936.1537430127@linux-at0a
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've spent some time reviewing this version.

Design
------

1. Even with your patch the stats collector still uses an UDP socket to
receive data. Now that the shared memory API is there, shouldn't the
messages be sent via shared memory queue? [1] That would increase the
reliability of message delivery.

I can actually imagine backends inserting data into the shared hash tables
themselves, but that might make them wait if the same entries are accessed
by another backend. It should be much cheaper just to insert message into
the queue and let the collector process it. In future version the collector
can launch parallel workers so that writes by backends do not get blocked
due to full queue.

2. I think the access to the shared hash tables introduces more contention
than necessary. For example, pgstat_recv_tabstat() retrieves "dbentry" and
leaves the containing hash table partition locked *exclusively* even if it
changes only the containing table entries, while changes of the containing
dbentry are done.

It appears that the shared hash tables are only modified by the stats
collector. The unnecessary use of the exclusive lock might be a bigger
issue in the future if the stats collector will use parallel
workers. Monitoring functions and autovacuum are affected by the locking
now.

(I see that the it's not trivial to get just-created entry locked in shared
mode: it may need a loop in which we release the exclusive lock and acquire
the shared lock unless the entry was already removed.)

3. Data in both shared_archiverStats and shared_globalStats is mostly accessed
w/o locking. Is that ok? I'd expect the StatsLock to be used for these.

Coding
------

* git apply v4-0003-dshash-based-stats-collector.patch needed manual
resolution of one conflict.

* pgstat_quickdie_handler() appears to be the only "quickdie handler" that
calls on_exit_reset(), although the comments are almost copy & pasted from
such a handler of other processes. Can you please explain what's specific
about pgstat.c?

* the variable name "area" would be sufficient if it was local to some
function, otherwise I think the name is too generic.

* likewise db_stats is too generic for a global variable. How about
"snapshot_db_stats_local"?

* backend_get_db_entry() passes 0 for handle to snapshot_statentry(). How
about DSM_HANDLE_INVALID ?

* I only see one call of snapshot_statentry_all() and it receives 0 for
handle. Thus the argument can be removed and the function does not have to
attach / detach to / from the shared hash table.

* backend_snapshot_global_stats() switches to TopMemoryContext before it calls
pgstat_attach_shared_stats(), but the latter takes care of the context
itself.

* pgstat_attach_shared_stats() - header comment should explain what the return
value means.

* reset_dbentry_counters() does more than just resetting the counters. Name
like initialize_dbentry() would be more descriptive.

* typos:

** backend_snapshot_global_stats(): "liftime" -> "lifetime"

** snapshot_statentry(): "entriy" -> "entry"

** backend_get_func_etnry(): "onshot" -> "oneshot"

** snapshot_statentry_all(): "Returns a local hash contains ..." -> "Returns a local hash containing ..."

[1] https://www.postgresql.org/message-id/20180711000605.sqjik3vqe5opqz33@alap3.anarazel.de

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2018-09-20 08:10:44 Re: [patch] Support LLVM 7
Previous Message ROS Didier 2018-09-20 07:23:31 impact of SPECTRE and MELTDOWN patches