Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-12-07 01:25:49
Message-ID: CALNJ-vRbftTop7SMjjWc1OJMoz1UwOSc10wskhCN3W4Y49bJFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> + /* Make a guess at a good size when we're not given a valid size. */
> + if (size == 0)
> + size = 1024;
>
> Should the default size be logged ?

> I'm not too sure if I know what you mean here. Should it be a power of
> 2? It is. Or do you mean I shouldn't use a magic number?

Using 1024 seems to be fine. I meant logging the default value of 1024 so
that user / dev can have better idea the actual value used (without looking
at the code).

>> Or do you think 98% is not a good number?

Since you have played with Result Cache, I would trust your judgment in
choosing the percentage.
It is fine not to expose this constant until the need arises.

Cheers

On Sun, Dec 6, 2020 at 5:15 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > There are two blocks with almost identical code (second occurrence in
> cache_store_tuple):
> >
> > + if (rcstate->mem_used > rcstate->mem_upperlimit)
> > + {
> > It would be nice if the code can be extracted to a method and shared.
>
> It's true, they're *almost* identical. I quite like the fact that one
> of the cases can have an unlikely() macro in there. It's pretty
> unlikely that we'd go into cache overflow just when adding a new cache
> entry. work_mem would likely have to be set to a couple of dozen bytes
> for that to happen. 64k is the lowest it can be set. However, I
> didn't really check to see if having that unlikely() macro increases
> performance. I've changed things locally here to add a new function
> named cache_check_mem(). I'll keep that for now, but I'm not sure if
> there's enough code there to warrant a function. The majority of the
> additional lines are from the comment being duplicated.
>
> > node->rc_status = RC_END_OF_SCAN;
> > return NULL;
> > }
> > else
> >
> > There are several places where the else keyword for else block can be
> omitted because the if block ends with return.
> > This would allow the code in else block to move leftward (for easier
> reading).
>
> OK, I've removed the "else" in places where it can be removed.
>
> > if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
> >
> > I noticed that right_hashfn isn't used. Would this cause some warning
> from the compiler (for some compiler the warning would be treated as error)
> ?
> > Maybe NULL can be passed as the last parameter. The return value of
> get_op_hash_functions would keep the current meaning (find both hash fn's).
>
> It's fine not to use output parameters. It's not the only one in the
> code base ignoring one from that very function. See
> execTuplesHashPrepare().
>
> > rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
> >
> > Maybe (in subsequent patch) GUC variable can be introduced for tuning
> the constant 0.98.
>
> I don't think exposing such knobs for users to adjust is a good idea.
> Can you think of a case where you'd want to change it? Or do you think
> 98% is not a good number?
>
> >
> > For +paraminfo_get_equal_hashops :
> >
> > + else
> > + Assert(false);
>
> I'm keen to leave it like it is. I don't see any need to bloat the
> compiled code with an elog(ERROR).
>
> There's a comment in RelOptInfo.lateral_vars that mentions:
>
> /* LATERAL Vars and PHVs referenced by rel */
>
> So, if anyone, in the future, wants to add some other node type to
> that list then they'll have a bit more work to do. Plus, I'm only
> doing the same as what's done in create_lateral_join_info().
>
> I'll run the updated patch which includes the cache_check_mem()
> function for a bit and post an updated patch to the main thread a bit
> later.
>
> Thanks for having a look at this patch.
>
> David
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-07 02:03:30 Re: Proposed patch for key managment
Previous Message David Rowley 2020-12-07 01:15:08 Re: Hybrid Hash/Nested Loop joins and caching results from subplans