Re: A performance issue with Memoize

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A performance issue with Memoize
Date: 2024-01-25 18:32:42
Message-ID: 422277.1706207562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> I think fixing it your way makes sense. I don't really see any reason
> why we should have two. However...

> Another way it *could* be fixed would be to get rid of pull_paramids()
> and change create_memoize_plan() to set keyparamids to all the param
> IDs that match are equal() to each param_exprs. That way nodeMemoize
> wouldn't purge the cache as we'd know the changing param is accounted
> for in the cache. For the record, I don't think that's better, but it
> scares me a bit less as I don't know what other repercussions there
> are of applying your patch to get rid of the duplicate
> NestLoopParam.paramval.

> I'd feel better about doing it your way if Tom could comment on if
> there was a reason he put the function calls that way around in
> 5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.
I'd change the comment though, maybe to

/*
* Replace any outer-relation variables with nestloop params.
*
* We must provide nestloop params for both lateral references of
* the subquery and outer vars in the scan_clauses. It's better
* to assign the former first, because that code path requires
* specific param IDs, while replace_nestloop_params can adapt
* to the IDs assigned by process_subquery_nestloop_params.
* This avoids possibly duplicating nestloop params when the
* same Var is needed for both reasons.
*/

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing

+ -> Memoize (actual rows=5000 loops=N)
+ Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

> I also feel like we might be getting a bit close to the minor version
> releases to be adjusting this stuff in back branches.

Yeah, I'm not sure I would change this in the back branches.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2024-01-25 19:07:11 Re: [PATCH] Add native windows on arm64 support
Previous Message Andrew Dunstan 2024-01-25 17:30:55 Re: [PATCH] Add native windows on arm64 support