Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
Date: 2019-11-16 23:01:08
Message-ID: 93E25EC9-D54F-4D16-A836-3608BBE179F0@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 12 Nov 2019, at 19:33, Andres Freund <andres(at)anarazel(dot)de> wrote:

Thanks for reviewing!

> On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
>>> On 11 Nov 2019, at 09:32, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>>> This part has resulted in 75c1921, and we could just change
>>> DecodeMultiInsert() so as if there is no tuple data then we'd just
>>> leave. However, I don't feel completely comfortable with that either
>>> as it would be nice to still check that for normal relations we
>>> *always* have a FPW available.
>>
>> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
>> IIUC (that is, non logically decoded relations), so it seems to me that we can
>> have a fastpath out of DecodeMultiInsert() which inspects that flag without
>> problems. Is this proposal along the lines of what you were thinking?
>
> Maybe I'm missing something, but it's the opposite, no?
> ...
> or am I misunderstanding what you mean?

Correct, as has been discussed in this thread already, I managed to write it
backwards.

>> @@ -1600,10 +1598,16 @@ recordDependencyOnExpr(const ObjectAddress *depender,
>> /* Remove any duplicates */
>> eliminate_duplicate_dependencies(context.addrs);
>>
>> - /* And record 'em */
>> - recordMultipleDependencies(depender,
>> - context.addrs->refs, context.addrs->numrefs,
>> - behavior);
>> + /*
>> + * And record 'em. If we know we only have a single dependency then
>> + * avoid the extra cost of setting up a multi insert.
>> + */
>> + if (context.addrs->numrefs == 1)
>> + recordDependencyOn(depender, &context.addrs->refs[0], behavior);
>> + else
>> + recordMultipleDependencies(depender,
>> + context.addrs->refs, context.addrs->numrefs,
>> + behavior);
>
> I'm not sure this is actually a worthwhile complexity to keep. Hard to
> believe that setting up a multi-insert is goign to have a significant
> enough overhead to matter here?
>
> And if it does, is there a chance we can hide this repeated block
> somewhere within recordMultipleDependencies() or such? I don't like the
> repetitiveness here. Seems recordMultipleDependencies() could easily
> optimize the case of there being exactly one dependency to insert?

Agreed, I've simplified by just calling recordMultipleDepencies() until that's
found to be too expensive.

>> + slot = palloc(sizeof(TupleTableSlot *) * natts);
>
> Hm. Looking at
>
> SELECT avg(pg_column_size(pa)) FROM pg_attribute pa;
>
> yielding ~144 bytes, we can probably cap this at 128 or such, without
> loosing efficiency. Or just use
> #define MAX_BUFFERED_BYTES 65535
> from copy.c or such (MAX_BUFFERED_BYTES / sizeof(FormData_pg_attribute)).

Added a limit using MAX_BUFFERED_BYTES as above, or the number of tuples,
whichever is smallest.

>> + /* This is a tad tedious, but way cleaner than what we used to do... */
>
> I don't like comments that refer to "what we used to" because there's no
> way for anybody to make sense of that, because it's basically a dangling
> reference :)

This is a copy/paste from InsertPgAttributeTuple added in 03e5248d0f0, which in
turn quite likely was a copy/paste from InsertPgClassTuple added in b7b78d24f7f
back in 2006. Looking at that commit, it's clear what the comment refers to
but it's quite useless in isolation. Since the current coding is its teens by
now, perhaps we should just remove the two existing occurrences?

I can submit a rewrite of the comments into something less gazing into the past
if you feel like removing these.

> This seems likely to waste some effort - during insertion the slot will
> be materialized, which'll copy the tuple. I think it'd be better to
> construct the tuple directly inside the slot's tts_values/isnull, and
> then store it with ExecStoreVirtualTuple().

I'm not sure why it looked that way, but it's clearly rubbish. Rewrote by
taking the TupleDesc as input (which addresses your other comment below too),
and create the tuples directly by using ExecStoreVirtualTuple.

>> + }
>> +
>> + /* finally insert the new tuples, update the indexes, and clean up */
>> + CatalogMultiInsertWithInfo(pg_attribute_rel, slot, natts, indstate);
>
> Previous comment:
>
> I think it might be worthwhile to clear and delete the slots
> after inserting. That's potentially a good bit of memory, no?
>
> Current comment:
>
> I think I quite dislike the API of CatalogMultiInsertWithInfo freeing
> the slots. It'd e.g. make it impossible to reuse them to insert more
> data. It's also really hard to understand

I don't have strong feelings, I see merit in both approches but the reuse
aspect is clearly the winner. I've changed it such that the caller is
responsible for freeing.

>> + attrs = palloc(sizeof(Form_pg_attribute) * natts);
>
> Hm. Why we we need this separate allocation? Isn't this exactly the same
> as what's in the tupledesc?

Fixed.

>> +/*
>> + * CatalogMultiInsertWithInfo
>
> Hm. The current function is called CatalogTupleInsert(), wouldn't
> CatalogTuplesInsertWithInfo() or such be more accurate? Or
> CatalogTuplesMultiInsertWithInfo()?

Fixed by opting for the latter, mostly since it seems best convey what the
function does.

> Same concern as in the equivalent pg_attribute code.

> Too much copying again.

Both of them fixed.

The attached patch addresses all of the comments, thanks again for reviewing!

cheers ./daniel

Attachment Content-Type Size
catalog_multi_insert-v6.patch application/octet-stream 51.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2019-11-17 00:53:27 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message vignesh C 2019-11-16 17:36:16 Copyright information in source files