| From: | Jacob Jackson <jej(dot)jackson(dot)08(at)gmail(dot)com> |
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique? |
| Date: | 2025-10-30 00:09:14 |
| Message-ID: | CAAiQw3w-uEZkz13Y2iU76CucRXBoR=kw6wOfq6gocT63dZLfYg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Thanks! This is very helpful.
On Wed, Oct 29, 2025 at 8:01 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <jej(dot)jackson(dot)08(at)gmail(dot)com>
> wrote:
> > Hello. I was looking at some query plans recently and noticed something
> that didn't make sense. I have a query that joins a table of questions with
> results for each question (using a table with a composite primary key of
> question id and a user id), filtered by user id. The question IDs and the
> combined question-userIds are guaranteed unique due to being primary keys,
> and yet Postgres still memoizes the inner loop results. Any ideas why?
>
> I agree that when the outer side of the join has unique values that
> Memoize does not make any sense. The planner currenrly puts quite a
> bit of faith in the row estimates for this and if you're getting this
> plan, then the estimates came back indicating there'd be fewer unique
> values of "QuestionUserStatus".question in the input than there are
> input rows to the Memoize node. If you delve into
> cost_memoize_rescan(), you'll see the code for this (look for where
> "hit_ratio" is calculated).
>
> There are also a few prechecks in get_memoize_path() to try to avoid
> this sort of thing, but unfortunately, the information to avoid
> Memoize when the outer side of the join is unique isn't available. We
> do have an "inner_unique" in JoinPathExtraData, but what we'd need for
> this and don't have is "outer_unique". If we had that, we could just
> exit early in get_memoize_path() if that's set to true. Whether or not
> going to the trouble of calculating "outer_unique" is worth the
> trouble, I'm not sure. There was some work on UniqueKeys a few years
> ago, which could have helped in this scenario as we could have more
> easily identified uniqueness at different join levels. That's no
> longer being worked on, as I understand it.
>
> On the other hand, it may be better to somehow enhance
> estimate_num_groups() so it can be given more details about the
> context of the request, i.e the set of Relids that are joined already
> for the input_rows. That way the code could do more analysis into the
> RelOptInfo base quals for the relevant relations. Extended statistics
> for n_distinct could also be applied in some cases too by looking for
> baserestrictinfo with equality quals or EquivalenceClasses with
> ec_has_const = true and a member for other Vars/Exprs in the extended
> statistics.
>
> Unfortunately, neither of these is a trivial fix.
>
> David
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Colin 't Hart | 2025-10-30 08:55:12 | Re: Two sequences associated with one identity column |
| Previous Message | Jacob Jackson | 2025-10-30 00:06:15 | Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique? |