Re: shm_toc_lookup API

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: shm_toc_lookup API
Date: 2017-06-05 16:19:01
Message-ID: 16984.1496679541@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jun 5, 2017 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think that shm_toc_lookup() ought to be made to throw elog(ERROR) on an
>> unexpected failure. To satisfy the one caller that doesn't want an error,
>> we could either add a "bool noError" parameter to it, or split it into two
>> functions shm_toc_lookup() and shm_toc_lookup_noerror(). The latter would
>> require touching less code, but the former is probably more like what we'd
>> have had if this were designed more carefully to begin with.

> Either is OK with me.

Done with the extra parameter. If we discover a reason to back-patch
this, we'd likely need to do it the other way in back branches, but
for now I'll refrain.

While I'm looking at this ... seems like there's a pretty basic coding-
rule violation here, namely that shm_toc_lookup() thinks it can read
toc->toc_nentry without any sort of locking. Since that field is declared
Size, this amounts to an assumption that 64-bit reads are atomic, which
is not per project practice.

In practice it probably can't fail even if 64-bit reads aren't atomic,
simply because we'll never have enough entries in a shm_toc to make the
high-order half ever change. But that just begs the question why the
field is declared Size rather than int. I think we should make it the
latter.

I am also thinking that most of the shm_toc functions need to have the
toc pointers declared as "volatile *", but particularly shm_toc_lookup.
That read_barrier call might prevent the hardware from reordering
accesses, but I don't think it stops the compiler from doing so.

These problems are probably just latent, because it looks to me like
in our current usage no process would ever be doing lookups on shm_tocs
that are being changed concurrently. But they'll bite us someday.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-05 17:00:51 Re: Make ANALYZE more selective about what is a "most common value"?
Previous Message Robert Haas 2017-06-05 15:42:39 Re: make check false success