Re: disfavoring unparameterized nested loops

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-18 10:20:22
Message-ID: CAExHW5vB+B_0zf1MDKFV_kL-FYkNQD=Rzx6Q2nUrEERC_mXg1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> The problem I have with this idea is that I really don't know how to
> properly calculate what the risk_factor should be set to. It seems
> easy at first to set it to something that has the planner avoid these
> silly 1-row estimate nested loop mistakes, but I think what we'd set
> the risk_factor to would become much more important when more and more
> Path types start using it. So if we did this and just guessed the
> risk_factor, that might be fine when only 1 of the paths being
> compared had a non-zero risk_factor, but as soon as both paths have
> one set, unless they're set to something sensible, then we just end up
> comparing garbage costs to garbage costs.

Risk factor is the inverse of confidence on estimate, lesser
confidence, higher risk. If we associate confidence with the
selectivity estimate, or computer confidence interval of the estimate
instead of a single number, we can associate risk factor with each
estimate. When we combine estimates to calculate new estimates, we
also combine their confidences/confidence intervals. If my memory
serves well, confidence intervals/confidences are calculated based on
the sample size and method used for estimation, so we should be able
to compute those during ANALYZE.

I have not come across many papers which leverage this idea. Googling
"selectivity estimation confidence interval", does not yield many
papers. Although I found [1] to be using a similar idea. So may be
there's not merit in this idea, thought theoretically it sounds fine
to me.

[1] https://pi3.informatik.uni-mannheim.de/~moer/Publications/vldb18_smpl_synop.pdf
--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-18 11:40:38 Re: row filtering for logical replication
Previous Message Fujii Masao 2021-06-18 09:18:59 Re: fdatasync performance problem with large number of DB files