Re: POC: Sharing record typmods between backends

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Sharing record typmods between backends
Date: 2017-08-20 23:02:52
Message-ID: CAEepm=04LM87Ya_Avgw40934Wh3G4Oyy+mmthYHuMb9m5WZwaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Pushing 0001, 0002 now.
>
> - rebased after conflicts
> - fixed a significant number of too long lines
> - removed a number of now superflous linebreaks

Thanks! Please find attached a rebased version of the rest of the patch set.

> Thomas, prepare yourself for some hate from extension and fork authors /
> maintainers ;)

/me hides

The attached version also fixes a couple of small details you
complained about last week:

On Wed, Aug 16, 2017 at 10:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > + size_t key_size; /* Size of the key (initial bytes of entry) */
>> > + size_t entry_size; /* Total size of entry */
>> >
>> > Wonder if it'd make sense to say that key/entry sizes to be only
>> > minimums? That means we could increase them to be the proper aligned
>> > size?
>>
>> I don't understand. You mean explicitly saying that there are
>> overheads? Doesn't that go without saying?
>
> I was thinking that we could do the MAXALIGN style calculations once
> instead of repeatedly, by including them in the key and entry sizes.

I must be missing something -- where do we do it repeatedly? The only
place we use MAXALIGN is a compile-type constant expression (see
expansion of macros ENTRY_FROM_ITEM and ITEM_FROM_ENTRY, and also in
one place AXALIGN(sizeof(dshash_table_item))).

> shared-record-typmods-v5.patchset/0004-Refactor-typcache.c-s-record-typmod-hash-table.patch
>
> + * hashTupleDesc
> + * Compute a hash value for a tuple descriptor.
> + *
> + * If two tuple descriptors would be considered equal by equalTupleDescs()
> + * then their hash value will be equal according to this function.
> + */
> +uint32
> +hashTupleDesc(TupleDesc desc)
> +{
> + uint32 s = 0;
> + int i;
> +
> + for (i = 0; i < desc->natts; ++i)
> + s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));
> +
> + return s;
> +}
>
> Hm, is it right not to include tdtypeid, tdtypmod, tdhasoid here?
> equalTupleDescs() does compare them...

OK, now adding natts (just for consistency), tdtypeid and tdhasoid to
be exactly like equalTupleDescs(). Note that tdtypmod is deliberately
*not* included.

> + return hashTupleDesc(((RecordCacheEntry *) data)->tupdesc);
> ...
> + return equalTupleDescs(((RecordCacheEntry *) a)->tupdesc,
> + ((RecordCacheEntry *) b)-
>
> I'd rather have local vars for the casted params, but it's not
> important.

Done.

> MemSet(&ctl, 0, sizeof(ctl));
> - ctl.keysize = REC_HASH_KEYS * sizeof(Oid);
> + ctl.keysize = 0; /* unused */
> ctl.entrysize = sizeof(RecordCacheEntry);
>
> Hm, keysize 0? Is that right? Wouldn't it be more correct to have both
> of the same size, given dynahash includes the key size in the entry, and
> the pointer really is the key?

Done.

> shared-record-typmods-v5.patchset/0006-Introduce-a-shared-memory-record-typmod-registry.patch
>
> Hm, name & comment don't quite describe this accurately anymore.

Updated commit message.

> +extern void EnsureCurrentSession(void);
> +extern void EnsureCurrentSession(void);
>
> duplicated.

Fixed.

> +/*
> + * We want to create a DSA area to store shared state that has the same extent
> + * as a session. So far, it's only used to hold the shared record type
> + * registry. We don't want it to have to create any DSM segments just yet in
> + * common cases, so we'll give it enough space to hold a very small
> + * SharedRecordTypmodRegistry.
> + */
> +#define SESSION_DSA_SIZE 0x30000
>
> Same "extent"? Maybe lifetime?

Done.

> +
> +/*
> + * Make sure that there is a CurrentSession.
> + */
> +void EnsureCurrentSession(void)
> +{
>
> linebreak.

Fixed.

> +{
> + if (CurrentSession == NULL)
> + {
> + MemoryContext old_context = MemoryContextSwitchTo(TopMemoryContext);
> +
> + CurrentSession = palloc0(sizeof(Session));
> + MemoryContextSwitchTo(old_context);
> + }
> +}
>
> Isn't MemoryContextAllocZero easier?

Done.

I also stopped saying "const TupleDesc" in a few places, which was a
thinko (I wanted pointer to const tupldeDesc, not const pointer to
tupleDesc...), and made sure that the shmem TupleDescs always have
tdtypmod actually set.

So as I understand it the remaining issues (aside from any
undiscovered bugs...) are:

1. Do we like "Session", "CurrentSession" etc? Robert seems to be
suggesting that this is likely to get in the way when we try to tackle
this area more thoroughly. Andres is suggesting that this is a good
time to take steps in this direction.

2. Andres didn't like what I did to DecrTupleDescRefCount, namely
allowing to run when there is no ResourceOwner. I now see that this
is probably an indication of a different problem; even if there were a
worker ResourceOwner as he suggested (or perhaps a session-scoped one,
which a worker would reset before being reused), it wouldn't be the
one that was active when the TupleDesc was created. I think I have
failed to understand the contracts here and will think/read about it
some more.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
shared-record-typmods-v6.patchset.tgz application/x-gzip 31.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-08-21 00:18:55 Re: Stats for triggers on partitioned tables not shown in EXPLAIN ANALYZE
Previous Message Daniel Gustafsson 2017-08-20 21:21:47 Re: Support for Secure Transport SSL library on macOS as OpenSSL alternative