Re: shared-memory based stats collector

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2021-03-22 19:33:26
Message-ID: 20210322193326.e64v5fbfpp3lhvke@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-19 14:27:38 -0700, Andres Freund wrote:
> Yep, it's not even particularly hard to hit:
>
> S0: CREATE TABLE a_table();
> S0: INSERT INTO a_table();
> S0: disconnect
> S1: set a breakpoint to just after the dshash_release_lock(), with an
> if objid == a_table_oid
> S1: SELECT pg_stat_get_live_tuples('a_table'::regclass);
> (will break at above breakpoint, without having incremented the
> refcount yet)
> S2: DROP TABLE a_table;
> S2: VACUUM pg_class;
>
> At that point S2's call to pgstat_vacuum_stat() will find the shared
> stats entry for a_table, delete the entry from the shared hash table,
> see that the stats data has a zero refcount, and free it. Once S1 wakes
> up it'll use already freed (and potentially since reused) memory.

I fixed this by initializing / increment the refcount while holding
dshash partition lock. To avoid the potential refcount leak in case the
lookup cache insertion fails due to OOM I changed things so that the
lookup cache is inserted, not just looked up, earlier. That also avoids
needing two hashtable ops in the cache miss case. The price of an empty
hashtable entry in the !create case doesn't seem high.

Related issue: delete_current_stats_entry() there's the following
comment:

/*
* Let the referrers drop the entry if any. Refcount won't be decremented
* until "dropped" is set true and StatsShmem->gc_count is incremented
* later. So we can check refcount to set dropped without holding a lock.
* If no one is referring this entry, free it immediately.
*/

I don't think this explanations is correct. gc_count might have been
incremented by another backend, or cleanup_dropped_stats_entries() might
run. So the whole bit about refcounts seems wrong.

I don't see what prevents a double-free here. Consider what happens if
S1: cleanup_dropped_stats_entries() does pg_atomic_sub_fetch_u32(&ent->shared->refcount, 1)
S2: delete_current_stats_entry() pg_atomic_read_u32(&header->refcount), reading 0
S1: dsa_free(area, ent->dsapointer);
S2: dsa_free(area, pdsa);
World: boom

I think the appropriate fix might be to not have ->dropped (or rather have it
just as a crosscheck), and have every non-dropped entry have an extra
refcount. When dropping the entry the refcount is dropped, and we can safely
free the entry. That way normal paths don't need to check ->dropped at all.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Oksman 2021-03-22 19:40:09 [PATCH] rename column if exists
Previous Message Jacob Champion 2021-03-22 19:17:26 Re: Proposal: Save user's original authenticated identity for logging