Quick Links

Re: POC: Sharing record typmods between backends

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: POC: Sharing record typmods between backends
Date:	2017-08-15 22:06:53
Message-ID:	20170815220653.bhz3s2zo7jo5dtjj@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2017-08-15 17:44:55 +1200, Thomas Munro wrote:
> > @@ -99,12 +72,9 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
> >
> > /*
> > * CreateTupleDesc
> > - * This function allocates a new TupleDesc pointing to a given
> > + * This function allocates a new TupleDesc by copying a given
> > * Form_pg_attribute array.
> > *
> > - * Note: if the TupleDesc is ever freed, the Form_pg_attribute array
> > - * will not be freed thereby.
> > - *
> >
> > I'm leaning towards no, but you could argue that we should just change
> > that remark to be about constr?
>
> I don't see why.

Because for that the freeing bit is still true, ie. it's still
separately allocated.

> > Review of 0003:
> >
> > I'm not doing a too detailed review, given I think there's some changes
> > in the pipeline.
>
> Yep. In the new patch set the hash table formerly known as DHT is now
> in patch 0004 and I made the following changes based on your feedback:
>
> 1. Renamed it to "dshash". The files are named dshash.{c,h}, and the
> prefix on identifiers is dshash_. You suggested dsmhash, but the "m"
> didn't seem to make much sense. I considered dsahash, but dshash
> seemed better. Thoughts?

WFM. Just curious, why didn't m make sense? I was referring to dynamic
shared memory hash - seems right. Whether there's an intermediary dsa
layer or not...

> 2. Ripped out the incremental resizing and iterator support for now,
> as discussed. I want to post patches to add those features when we
> have a use case but I can see that it's no slam dunk so I want to keep
> that stuff out of the dependency graph for parallel hash.

Cool.

> 3. Added support for hash and compare functions with an extra
> argument for user data, a bit like qsort_arg_comparator. This is
> necessary for functions that need to be able to dereference a
> dsa_pointer stored in the entry, since they need the dsa_area. (I
> would normally call such an argument 'user_data' or 'context' or
> something but 'arg' seemed to be establish by qsort_arg.)

Good.

> > +/*
> > + * The set of parameters needed to create or attach to a hash table. The
> > + * members tranche_id and tranche_name do not need to be initialized when
> > + * attaching to an existing hash table. The functions do need to be supplied
> > + * even when attaching because we can't safely share function pointers between
> > + * backends in general.
> > + */
> > +typedef struct
> > +{
> > + size_t key_size; /* Size of the key (initial bytes of entry) */
> > + size_t entry_size; /* Total size of entry */
> > + dht_compare_function compare_function; /* Compare function */
> > + dht_hash_function hash_function; /* Hash function */
> > + int tranche_id; /* The tranche ID to use for locks. */
> > +} dht_parameters;
> >
> > Wonder if it'd make sense to say that key/entry sizes to be only
> > minimums? That means we could increase them to be the proper aligned
> > size?
>
> I don't understand. You mean explicitly saying that there are
> overheads? Doesn't that go without saying?

I was thinking that we could do the MAXALIGN style calculations once
instead of repeatedly, by including them in the key and entry sizes.

> > Ignoring aspects related to REC_HASH_KEYS and related discussion, since
> > we're already discussing that in another email.
>
> This version includes new refactoring patches 0003, 0004 to get rid of
> REC_HASH_KEYS by teaching the hash table how to use a TupleDesc as a
> key directly. Then the shared version does approximately the same
> thing, with a couple of extra hoops to jump thought. Thoughts?

Will look.

> > +static int32
> > +find_or_allocate_shared_record_typmod(TupleDesc tupdesc)
> > +{
> >
> > + /*
> > + * While we still hold the atts_index entry locked, add this to
> > + * typmod_index. That's important because we don't want anyone to be able
> > + * to find a typmod via the former that can't yet be looked up in the
> > + * latter.
> > + */
> > + PG_TRY();
> > + {
> > + typmod_index_entry =
> > + dht_find_or_insert(CurrentSharedRecordTypmodRegistry.typmod_index,
> > + &typmod, &found);
> > + if (found)
> > + elog(ERROR, "cannot create duplicate shared record typmod");
> > + }
> > + PG_CATCH();
> > + {
> > + /*
> > + * If we failed to allocate or elog()ed, we have to be careful not to
> > + * leak the shared memory. Note that we might have created a new
> > + * atts_index entry above, but we haven't put anything in it yet.
> > + */
> > + dsa_free(CurrentSharedRecordTypmodRegistry.area, shared_dp);
> > + PG_RE_THROW();
> > + }
> >
> > Not entirely related, but I do wonder if we don't need abetter solution
> > to this. Something like dsa pointers that register appropriate memory
> > context callbacks to get deleted in case of errors?
>
> Huh, scope guards. I have had some ideas about some kind of
> destructor mechanism that might replace what we're doing with DSM
> detach hooks in various places and also work in containers like hash
> tables (ie entries could have destructors), but doing it with the
> stack is another level...

Not sure what you mean with 'stack'?

shared-record-typmods-v5.patchset/0004-Refactor-typcache.c-s-record-typmod-hash-table.patch

+ * hashTupleDesc
+ * Compute a hash value for a tuple descriptor.
+ *
+ * If two tuple descriptors would be considered equal by equalTupleDescs()
+ * then their hash value will be equal according to this function.
+ */
+uint32
+hashTupleDesc(TupleDesc desc)
+{
+ uint32 s = 0;
+ int i;
+
+ for (i = 0; i < desc->natts; ++i)
+ s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));
+
+ return s;
+}

Hm, is it right not to include tdtypeid, tdtypmod, tdhasoid here?
equalTupleDescs() does compare them...

+/*
+ * Hash function for the hash table of RecordCacheEntry.
+ */
+static uint32
+record_type_typmod_hash(const void *data, size_t size)
+{
+ return hashTupleDesc(((RecordCacheEntry *) data)->tupdesc);
+}
+
+/*
+ * Match function for the hash table of RecordCacheEntry.
+ */
+static int
+record_type_typmod_compare(const void *a, const void *b, size_t size)
+{
+ return equalTupleDescs(((RecordCacheEntry *) a)->tupdesc,
+ ((RecordCacheEntry *) b)->tupdesc) ? 0 : 1;
+}

I'd rather have local vars for the casted params, but it's not
important.

MemSet(&ctl, 0, sizeof(ctl));
- ctl.keysize = REC_HASH_KEYS * sizeof(Oid);
+ ctl.keysize = 0; /* unused */
ctl.entrysize = sizeof(RecordCacheEntry);

Hm, keysize 0? Is that right? Wouldn't it be more correct to have both
of the same size, given dynahash includes the key size in the entry, and
the pointer really is the key?

Otherwise looks pretty good.

shared-record-typmods-v5.patchset/0006-Introduce-a-shared-memory-record-typmod-registry.patch

Hm, name & comment don't quite describe this accurately anymore.

+/*
+ * A struct encapsulating some elements of a user's session. For now this
+ * manages state that applies to parallel query, but it principle it could
+ * include other things that are currently global variables.
+ */
+typedef struct Session
+{
+ dsm_segment *segment; /* The session-scoped DSM segment. */
+ dsa_area *area; /* The session-scoped DSA area. */
+
+ /* State managed by typcache.c. */
+ SharedRecordTypmodRegistry *typmod_registry;
+ dshash_table *record_table; /* Typmods indexed by tuple descriptor */
+ dshash_table *typmod_table; /* Tuple descriptors indexed by typmod */
+} Session;

Interesting. I was apparently thinking slightly differently. I'd have
thought we'd have Session struct in statically allocated shared
memory. Which'd then have dsa_handle, dshash_table_handle, ... members.

+extern void EnsureCurrentSession(void);
+extern void EnsureCurrentSession(void);

duplicated.

+/*
+ * We want to create a DSA area to store shared state that has the same extent
+ * as a session. So far, it's only used to hold the shared record type
+ * registry. We don't want it to have to create any DSM segments just yet in
+ * common cases, so we'll give it enough space to hold a very small
+ * SharedRecordTypmodRegistry.
+ */
+#define SESSION_DSA_SIZE 0x30000

Same "extent"? Maybe lifetime?

+
+/*
+ * Make sure that there is a CurrentSession.
+ */
+void EnsureCurrentSession(void)
+{

linebreak.

+{
+ if (CurrentSession == NULL)
+ {
+ MemoryContext old_context = MemoryContextSwitchTo(TopMemoryContext);
+
+ CurrentSession = palloc0(sizeof(Session));
+ MemoryContextSwitchTo(old_context);
+ }
+}

Isn't MemoryContextAllocZero easier?

Greetings,

Andres Freund

In response to

Re: POC: Sharing record typmods between backends at 2017-08-15 05:44:55 from Thomas Munro

Responses

Re: POC: Sharing record typmods between backends at 2017-08-16 00:30:16 from Robert Haas
Re: POC: Sharing record typmods between backends at 2017-08-16 00:42:20 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-08-15 22:08:47	Re: [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.
Previous Message	Thomas Munro	2017-08-15 22:01:04	Re: Thoughts on unit testing?