Re: shared-memory based stats collector

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: sfrost(at)snowman(dot)net, alvherre(at)2ndquadrant(dot)com, michael(at)paquier(dot)xyz, thomas(dot)munro(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, ah(at)cybertec(dot)at, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2020-09-25 00:27:26
Message-ID: 20200925.092726.985174538786385650.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reviewing!

At Mon, 21 Sep 2020 19:47:04 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Hi,
>
> On 2020-09-08 17:55:57 +0900, Kyotaro Horiguchi wrote:
> > Locks on the shared statistics is acquired by the units of such like
> > tables, functions so the expected chance of collision are not so high.
>
> I can't really parse that...

Mmm... Is the following readable?

Shared statistics locks are acquired by units such as tables,
functions, etc., so the chances of an expected collision are not so
high.

Anyway, this is found to be wrong, so I removed it.

> > Furthermore, until 1 second has elapsed since the last flushing to
> > shared stats, lock failure postpones stats flushing so that lock
> > contention doesn't slow down transactions.
>
> I think I commented on that before, but to me 1s seems way too low to
> switch to blocking lock acquisition. What's the reason for such a low
> limit?

It was 0.5 seconds previously. I don't have a clear idea of a
reasonable value for it. One possible rationale might be to have 1000
clients each have a writing time slot of 10ms.. So, 10s as the minimum
interval. I set maximum interval to 60, and retry interval to
1s. (Fixed?)

> > /*
> > - * Clean up any dead statistics collector entries for this DB. We always
> > + * Clean up any dead activity statistics entries for this DB. We always
> > * want to do this exactly once per DB-processing cycle, even if we find
> > * nothing worth vacuuming in the database.
> > */
>
> What is "activity statistics"?

I don't get your point. It is formally the replacement word for
"statistics collector". The "statistics collector (process)" no longer
exists, so I had to invent a name for the successor mechanism that is
distinguishable with data/column statistics. If it is not the proper
wording, I'd appreciate it if you could suggest the appropriate one.

> > @@ -2816,8 +2774,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
> > }
> >
> > /* fetch the pgstat table entry */
> > - tabentry = get_pgstat_tabentry_relid(relid, classForm->relisshared,
> > - shared, dbentry);
> > + tabentry = pgstat_fetch_stat_tabentry_snapshot(classForm->relisshared,
> > + relid);
>
> Why do all of these places deal with a snapshot? For most it seems to
> make much more sense to just look up the entry and then copy that into
> local memory? There may be some place that need some sort of snapshot
> behaviour that's stable until commit / pgstat_clear_snapshot(). But I
> can't reallly see many?

Ok, I reread this thread and agree that there's a (vague) consensus to
remove the snapshot stuff. Backend-statistics (bestats) still are
stable during a transaction.

> > +#define PGSTAT_MIN_INTERVAL 1000 /* Minimum interval of stats data
> >
> > +#define PGSTAT_MAX_INTERVAL 10000 /* Longest interval of stats data
> > + * updates */
>
> These don't really seem to be in line with the commit message...

Oops! Sorry. Fixed both of this value and the commit message (and the
file comment).

> > + * dshash pgStatSharedHash
> > + * -> PgStatHashEntry (dshash entry)
> > + * (dsa_pointer)-> PgStatEnvelope (dsa memory block)
>
> I don't like 'Envelope' that much. If I understand you correctly that's
> a common prefix that's used for all types of stat objects, correct? If
> so, how about just naming it PgStatEntryBase or such? I think it'd also
> be useful to indicate in the "are stored as" part that PgStatEnvelope is
> just the common prefix for an allocation.

The name makes sense. Thanks! (But the struct is now gone..)

> > -typedef struct TabStatHashEntry
> > +static size_t pgstat_entsize[] =
>
> > +/* Ditto for local statistics entries */
> > +static size_t pgstat_localentsize[] =
> > +{
> > + 0, /* PGSTAT_TYPE_ALL: not an entry */
> > + sizeof(PgStat_StatDBEntry), /* PGSTAT_TYPE_DB */
> > + sizeof(PgStat_TableStatus), /* PGSTAT_TYPE_TABLE */
> > + sizeof(PgStat_BackendFunctionEntry) /* PGSTAT_TYPE_FUNCTION */
> > +};
>
> These probably should be const as well.

Right. Fixed.

>
> > /*
> > - * Backends store per-function info that's waiting to be sent to the collector
> > - * in this hash table (indexed by function OID).
> > + * Stats numbers that are waiting for flushing out to shared stats are held in
> > + * pgStatLocalHash,
> > */
> > -static HTAB *pgStatFunctions = NULL;
> > +typedef struct PgStatHashEntry
> > +{
> > + PgStatHashEntryKey key; /* hash key */
> > + dsa_pointer env; /* pointer to shared stats envelope in DSA */
> > +} PgStatHashEntry;
> > +
> > +/* struct for shared statistics entry pointed from shared hash entry. */
> > +typedef struct PgStatEnvelope
> > +{
> > + PgStatTypes type; /* statistics entry type */
> > + Oid databaseid; /* databaseid */
> > + Oid objectid; /* objectid */
>
> Do we need this information both here and in PgStatHashEntry? It's
> possible that it's worthwhile, but I am not sure it is.

Same key values were stored in PgStatEnvelope, PgStat(Local)HashEntry,
and PgStat_Stats*Entry. And I thought the same while developing. After
some thoughts, I managed to remove the duplicate values other than
PgStat(Local)HashEntry. Fixed.

> > + size_t len; /* length of body, fixed per type. */
>
> Why do we need this? Isn't that something that can easily be looked up
> using the type?

Not only they are virtually fixed values, but they were found to be
write-only variables. Removed.

> > + LWLock lock; /* lightweight lock to protect body */
> > + int body[FLEXIBLE_ARRAY_MEMBER]; /* statistics body */
> > +} PgStatEnvelope;
>
> What you're doing here with 'body' doesn't provide enough guarantees
> about proper alignment. E.g. if one of the entry types wants to store a
> double, this won't portably work, because there's platforms that have 4
> byte alignment for ints, but 8 byte alignment for doubles.
>
>
> Wouldn't it be better to instead embed PgStatEnvelope into the struct
> that's actually stored? E.g. something like
>
> struct PgStat_TableStatus
> {
> PgStatEnvelope header; /* I'd rename the type */
> TimestampTz vacuum_timestamp; /* user initiated vacuum */
> ...
> }
>
> or if you don't want to do that because it'd require declaring
> PgStatEnvelope in the header (not sure that'd really be worth avoiding),
> you could just get rid of the body field and just do the calculation
> using something like MAXALIGN((char *) envelope + sizeof(PgStatEnvelope))

As the result of the modification so far, there is only one member,
lock, left in the PgStatEnvelope (or PgStatEntryBase) struct. I chose
to embed it to each PgStat_Stat*Entry structs as
PgStat_StatEntryHeader.

> > + * Snapshot is stats entry that is locally copied to offset stable values for a
> > + * transaction.
...
> The amount of code needed for this snapshot stuff seems unreasonable to
> me, especially because I don't see why we really need it. Is this just
> so that there's no skew between all the columns of pg_stat_all_tables()
> etc?
>
> I think this needs a lot more comments explaining what it's trying to
> achieve.

I don't insist on keeping the behavior. Removed snapshot stuff only
of pgstat stuff. (beentry snapshot is left alone.)

> > +/*
> > + * Newly created shared stats entries needs to be initialized before the other
> > + * processes get access it. get_stat_entry() calls it for the purpose.
> > + */
> > +typedef void (*entry_initializer) (PgStatEnvelope * env);
>
> I think we should try to not need it, instead declaring that all fields
> are zero initialized. That fits well together with my suggestion to
> avoid duplicating the database / object ids.

Now that entries don't have type-specific fields that need a special
care, I removed that stuff altogether.

> > +static void
> > +attach_shared_stats(void)
> > +{
...
> > + shared_globalStats = (PgStat_GlobalStats *)
> > + dsa_get_address(area, StatsShmem->global_stats);
> > + shared_archiverStats = (PgStat_ArchiverStats *)
> > + dsa_get_address(area, StatsShmem->archiver_stats);
> > +
> > + shared_SLRUStats = (PgStatSharedSLRUStats *)
> > + dsa_get_address(area, StatsShmem->slru_stats);
> > + LWLockInitialize(&shared_SLRUStats->lock, LWTRANCHE_STATS);
>
> I don't think it makes sense to use dsa allocations for any of the fixed
> size stats (global_stats, archiver_stats, ...). They should just be
> direct members of StatsShmem? Then we also don't need the shared_*
> helper variables

I intended to reduce the amount of fixed-allocated shared memory, or
make maximum use of DSA. However, you're right. Now they are members
of StatsShmem.

>
> > + /* Load saved data if any. */
> > + pgstat_read_statsfiles();
>
> Hm. Is it a good idea to do this as part of the shmem init function?
> That's a lot more work than we normally do in these.
>
> > +/* ----------
> > + * detach_shared_stats() -
> > + *
> > + * Detach shared stats. Write out to file if we're the last process and told
> > + * to do so.
> > + * ----------
> > */
> > static void
> > -pgstat_reset_remove_files(const char *directory)
> > +detach_shared_stats(bool write_stats)
>
> I think it'd be better to have an explicit call in the shutdown sequence
> somewhere to write out the data, instead of munging detach and writing
> stats out together.

It is actually strange that attach_shared_stats reads file in a
StatsLock section while it attaches existing shared memory area
deliberately outside the same lock section. So I moved the call to
pg_stat_read/write_statsfiles() out of StatsLock section as the first
step. But I couldn't move pgstat_write_stats_files() out of (or,
before or after) detach_shared_stats(), because I didn't find a way to
reliably check if the exiting process is the last detacher by a
separate function from detach_shared_stats().

(continued)
=====

The attached is the updated version taking in the comments above. I
continue to address the rest of the comments. Only 0004 is revised.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v37-0001-sequential-scan-for-dshash.patch text/x-patch 8.4 KB
v37-0002-Add-conditional-lock-feature-to-dshash.patch text/x-patch 6.2 KB
v37-0003-Make-archiver-process-an-auxiliary-process.patch text/x-patch 17.7 KB
v37-0004-Shared-memory-based-stats-collector.patch text/x-patch 256.4 KB
v37-0005-Doc-part-of-shared-memory-based-stats-collector.patch text/x-patch 20.7 KB
v37-0006-Remove-the-GUC-stats_temp_directory.patch text/x-patch 9.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-09-25 00:33:48 Re: "cert" + clientcert=verify-ca in pg_hba.conf?
Previous Message Tom Lane 2020-09-25 00:05:10 Re: Handing off SLRU fsyncs to the checkpointer