Re: WIP: [[Parallel] Shared] Hash

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-02-16 13:57:21
Message-ID: CA+TgmoYuPOc=+xrG1v0fCsoLbKAab9F1ddOeaaiLMzKOiBar1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> 0002-hj-add-dtrace-probes-v5.patch
>
> Hm. I'm personally very unenthusiastic about addming more of these, and
> would rather rip all of them out. I tend to believe that static
> problems simply aren't a good approach for anything requiring a lot of
> detail. But whatever.

I'm not a big fan of either static problems or static probes, myself.

> Isn't it kinda weird to do this from within dense_alloc()? I mean that
> dumps a lot of data to disk, frees a bunch of memory and so on - not
> exactly what "dense_alloc" implies. Isn't the free()ing part also
> dangerous, because the caller might actually use some of that memory,
> like e.g. in ExecHashRemoveNextSkewBucket() or such. I haven't looked
> deeply enough to check whether that's an active bug, but it seems like
> inviting one if not.

I haven't looked at this, but one idea might be to just rename
dense_alloc() to ExecHashBlahBlahSomething(). If there's a real
abstraction layer problem here then we should definitely fix it, but
maybe it's just the angle at which you hold your head.

> To me that is a weakness in the ExecShutdownNode() API - imo child nodes
> should get the chance to shutdown before the upper-level node.
> ExecInitNode/ExecEndNode etc give individual nodes the freedom to do
> things in the right order, but ExecShutdownNode() doesn't. I don't
> quite see why we'd want to invent a separate ExecDetachNode() that'd be
> called immediately before ExecShutdownNode().

Interestingly, the same point came up on the Parallel Bitmap Heap Scan thread.

> An easy way to change that would be to return in the
> ExecShutdownNode()'s T_GatherState case, and delegate the responsibility
> of calling it on Gather's children to ExecShutdownGather().
> Alternatively we could make it a full-blown thing like ExecInitNode()
> that every node needs to implement, but that seems a bit painful.

I was thinking we should just switch things so that ExecShutdownNode()
recurses first, and then does the current node. There's no real
excuse for a node terminating the shutdown scan early, I think.

> Or have I missed something here?
>
> Random aside: Wondered before if having to provide all executor
> callbacks is a weakness of our executor integration, and whether it
> shouldn't be a struct of callbacks instead...

I honestly have no idea whether that would be better or worse from the
CPU's point of view.

> I think it's a bit weird that the parallel path now has an exit path
> that the non-parallel path doesn't have.

I'm not sure about this particular one, but in general those are
pretty common. For example, look at the changes
569174f1be92be93f5366212cc46960d28a5c5cd made to _bt_first(). When
you get there, you can discover that you aren't actually the first,
and that in fact all the work is already complete, and there's nothing
left for you to do but give up.

> Hm. That seems a bit on the detailed side - if we're going that way it
> seems likely that we'll end up with hundreds of wait events. I don't
> think gradually evolving wait events into something like a query
> progress framework is a good idea.

I'm pretty strongly of the opinion that we should not reuse multiple
wait events for the same purpose. The whole point of the wait event
system is to identify what caused the wait. Having relatively
recently done a ton of work to separate all of the waits in the system
and identify them individually, I'm loathe to see us start melding
things back together again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-16 13:58:59 Re: Should we cacheline align PGXACT?
Previous Message Robert Haas 2017-02-16 13:43:38 Re: AT detach partition is broken