Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2021-03-01 21:57:16
Message-ID: AFC09B65-A354-4A43-B1D4-2CC313BF3CCD@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 1, 2021, at 1:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> [ new patches ]
>
> Regarding 0001:
>
> There seem to be whitespace-only changes to the comment for select_loop().
>
> I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
> changes could be simpler. First idea: Suppose you had
> ParallelSlotsSetup(numslots) that just creates the slot array with 0
> connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
> want to make it own an existing connection. That seems like it might
> be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB()
> altogether, and just let ParallelSlotsGetIdle() connect the other
> slots as required? Preconnecting all slots before we do anything is
> good because ... of what?

Mostly because, if --jobs is set too high, you get an error before launching any work. I don't know that it's really a big deal if vacuumdb or reindexdb have a bunch of tasks kicked off prior to exit(1) due to not being able to open connections for all the slots, but it is a behavioral change.

> I also wonder if things might be simplified by introducing a wrapper
> object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
> number of slots (num_slots), the array of actual PGconn objects, and
> the ConnParams to be used for new connections, and the initcmd to be
> used for new connections. Maybe also the progname. This seems like it
> would avoid a bunch of repeated parameter passing: you could just
> create the ParallelSlotArray with the right contents, and then pass it
> around everywhere, instead of having to keep passing the same stuff
> in. If you want to switch to connecting to a different DB, you tweak
> the ConnParams - maybe using an accessor function - and the system
> figures the rest out.

The existing version of parallel slots (before any of my changes) could already have been written that way, but the author chose not to. I thought about making the sort of change you suggest, and decided against, mostly on the principle of stare decisis. But the idea is very appealing, and since you're on board, I think I'll go make that change.

> I wonder if it's really useful to generalize this to a point of caring
> about all the ConnParams fields, too. Like, if you only provide
> ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
> that can change so you don't need to care about the others. And maybe
> you also don't really need to keep the ConnParams fields in every
> slot, either. Like, couldn't you just say something like: if
> (strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
> can't reuse without a reconnect }? I know sometimes a dbname is really
> a whole connection string, but perhaps we could try to fix that by
> using PQconninfoParse() in the right place, so that what ends up in
> the cparams is just the db name, not a whole connection string.

I went a little out of my way to avoid that, as I didn't want the next application that uses parallel slots to have to refactor it again, if for example they want to process in parallel databases listening on different ports, or to process commands issued under different roles.

> This is just based on a relatively short amount of time spent studying
> the patch, so I might well be off-base here. What do you think?

I like the ParallelSlotArray idea, and will go do that now. I'm happy to defer to your judgement on the other stuff, too, but will wait to hear back from you.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-03-01 22:23:12 Re: enable_incremental_sort changes query behavior
Previous Message Tom Lane 2021-03-01 21:50:01 Re: Bug in error reporting for multi-line JSON