Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-01-29 05:41:52
Message-ID: CAApHDvpeUqAsaBAnvfekXkVhGhW97aargELwJ3uBBztuWDffTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

I've taken most of your suggestions. The things quoted below are just
the ones I didn't agree with or didn't understand.

On Thu, 28 Jan 2021 at 18:43, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote:
> > +typedef struct EstimationInfo
> > +{
> > + int flags; /* Flags, as defined above to mark special
> > + * properties of the estimation. */
>
> Maybe it should be a bits32 ?

I've changed this to uint32. There are a few examples in the code
base of bit flags using int. e.g PlannedStmt.jitFlags and
_mdfd_getseg()'s "behavior" parameter, there are also quite a few
using unsigned types.

> (Also, according to Michael, some people preferred 0x01 to 1<<0)

I'd rather keep the (1 << 0). I think that it gets much easier to
read when we start using more significant bits. Granted the codebase
has lots of examples of each. I just picked the one I prefer. If
there's some consensus that we switch the bit-shifting to hex
constants for other bitflag defines then I'll change it.

> I think "result cache nodes" should be added here:
>
> doc/src/sgml/config.sgml- <para>
> doc/src/sgml/config.sgml- Hash-based operations are generally more sensitive to memory
> doc/src/sgml/config.sgml- availability than equivalent sort-based operations. The
> doc/src/sgml/config.sgml- memory available for hash tables is computed by multiplying
> doc/src/sgml/config.sgml- <varname>work_mem</varname> by
> doc/src/sgml/config.sgml: <varname>hash_mem_multiplier</varname>. This makes it
> doc/src/sgml/config.sgml- possible for hash-based operations to use an amount of memory
> doc/src/sgml/config.sgml- that exceeds the usual <varname>work_mem</varname> base
> doc/src/sgml/config.sgml- amount.
> doc/src/sgml/config.sgml- </para>

I'd say it would be better to mention it in the previous paragraph.
I've done that. It now looks like:

Hash tables are used in hash joins, hash-based aggregation, result
cache nodes and hash-based processing of <literal>IN</literal>
subqueries.
</para>

Likely setops should be added to that list too, but not by this patch.

> Language fixen follow:
>
> > + * Initialize the hash table to empty.
>
> as empty

Perhaps, but I've kept the "to empty" as it's used in
nodeRecursiveunion.c and nodeSetOp.c to do the same thing. If you
propose a patch that gets transaction to change those instances then
I'll switch this one too.

I'm just in the middle of considering some other changes to the patch
and will post an updated version once I'm done with that.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-01-29 05:46:54 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Bharath Rupireddy 2021-01-29 05:38:23 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit