Re: Switch to multi-inserts for pg_depend

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Switch to multi-inserts for pg_depend
Date: 2020-08-11 00:32:21
Message-ID: 20200811003221.iuawregryk6z2wjd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-07 15:16:19 +0900, Michael Paquier wrote:
> From cd117fa88938c89ac953a5e3c036828337150b07 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael(at)paquier(dot)xyz>
> Date: Fri, 7 Aug 2020 10:57:40 +0900
> Subject: [PATCH 1/2] Use multi-inserts for pg_depend
>
> This is a follow-up of the work done in e3931d01. This case is a bit
> different than pg_attribute and pg_shdepend: the maximum number of items
> to insert is known in advance, but there is no need to handle pinned
> dependencies. Hence, the base allocation for slots is done based on the
> number of items and the maximum allowed with a cap at 64kB, and items
> are initialized once used to minimize the overhead of the operation.
>
> Author: Daniel Gustafsson, Michael Paquier
> Discussion: https://postgr.es/m/XXX
> ---
> src/backend/catalog/pg_depend.c | 95 ++++++++++++++++++++++++---------
> 1 file changed, 69 insertions(+), 26 deletions(-)
>
> diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
> index 70baf03178..596f0c5e29 100644
> --- a/src/backend/catalog/pg_depend.c
> +++ b/src/backend/catalog/pg_depend.c
> @@ -47,6 +47,12 @@ recordDependencyOn(const ObjectAddress *depender,
> recordMultipleDependencies(depender, referenced, 1, behavior);
> }
>
> +/*
> + * Cap the maximum amount of bytes allocated for recordMultipleDependencies()
> + * slots.
> + */
> +#define MAX_PGDEPEND_INSERT_BYTES 65535
> +

Do we really want to end up with several separate defines for different
type of catalog batch inserts? That doesn't seem like a good
thing. Think there should be a single define for all catalog bulk
inserts.

> /*
> * Record multiple dependencies (of the same kind) for a single dependent
> * object. This has a little less overhead than recording each separately.
> @@ -59,10 +65,10 @@ recordMultipleDependencies(const ObjectAddress *depender,
> {
> Relation dependDesc;
> CatalogIndexState indstate;
> - HeapTuple tup;
> - int i;
> - bool nulls[Natts_pg_depend];
> - Datum values[Natts_pg_depend];
> + int slotCount, i;
> + TupleTableSlot **slot;
> + int nslots, max_slots;
> + bool slot_init = true;
>
> if (nreferenced <= 0)
> return; /* nothing to do */
> @@ -76,11 +82,18 @@ recordMultipleDependencies(const ObjectAddress *depender,
>
> dependDesc = table_open(DependRelationId, RowExclusiveLock);
>
> + /*
> + * Allocate the slots to use, but delay initialization until we know that
> + * they will be used.
> + */
> + max_slots = Min(nreferenced,
> + MAX_PGDEPEND_INSERT_BYTES / sizeof(FormData_pg_depend));
> + slot = palloc(sizeof(TupleTableSlot *) * max_slots);
> +
> /* Don't open indexes unless we need to make an update */
> indstate = NULL;
>
> - memset(nulls, false, sizeof(nulls));
> -
> + slotCount = 0;
> for (i = 0; i < nreferenced; i++, referenced++)
> {
> /*
> @@ -88,38 +101,68 @@ recordMultipleDependencies(const ObjectAddress *depender,
> * need to record dependencies on it. This saves lots of space in
> * pg_depend, so it's worth the time taken to check.
> */
> - if (!isObjectPinned(referenced, dependDesc))
> + if (isObjectPinned(referenced, dependDesc))
> + continue;
> +

Hm, would it be better to first iterate over the dependencies, compute
the number of dependencies to be inserted, and then go ahead and create
the right number of slots?

> From fcc0a11e9fc94d2fedc71dd10ba2a23713225963 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael(at)paquier(dot)xyz>
> Date: Fri, 7 Aug 2020 15:14:51 +0900
> Subject: [PATCH 2/2] Switch to multi-insert dependencies for many code paths
>
> This makes use of the new APIs to insert dependencies in groups, instead
> of doing the operation one-by-one.

Seems several places have been modified to new APIs despite only
covering a single dependency. Perhaps worth mentioning?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-11 00:33:22 Re: walsender waiting_for_ping spuriously set
Previous Message Andres Freund 2020-08-11 00:21:34 Re: Hybrid Hash/Nested Loop joins and caching results from subplans