From: | Emre Hasegeli <emre(at)hasegeli(dot)com> |
---|---|

To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |

Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se> |

Subject: | Re: Selectivity estimation for inet operators |

Date: | 2014-08-31 17:59:18 |

Message-ID: | 20140831175918.GB8990@hasegeli-2.local |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:

> > * inet_mcv_join_selec() is O(n^2) where n is the number of entries in

> > the MCV lists. With the max statistics target of 10000, a worst case

> > query on my laptop took about 15 seconds to plan. Maybe that's

> > acceptable, but you went through some trouble to make planning of MCV vs

> > histogram faster, by the log2 method to compare only some values, so I

> > wonder why you didn't do the same for the MCV vs MCV case?

>

> Actually, what I think needs to be asked is the opposite question: why is

> the other code ignoring some of the statistical data? If the user asked

> us to collect a lot of stats detail it seems reasonable that he's

> expecting us to use it to get more accurate estimates. It's for sure

> not obvious why these estimators should take shortcuts that are not being

> taken in the much-longer-established code for scalar comparison estimates.

It will still use more statistical data, when statistics_target is

higher. It was not sure that the user wants to spent O(n^2) amount

of time based on statistics_target. Attached version is without

this optimization. Estimates are better without it, but planning

takes more time.

> I'm not exactly convinced that the math adds up in this logic, either.

> The way in which it combines results from looking at the MCV lists and

> at the histograms seems pretty arbitrary.

I taught the product of the join will be

(left_mcv + left_histogram) * (right_mcv + right_histogram) * selectivity

and tried to calculate it as in the following:

(left_mcv * right_mcv * selectivity) +

(right_mcv * left_histogram * selectivity) +

(left_mcv * right_histogram * selectivity) +

(left_histogram * right_histogram * selectivity)

where left_histogram is

1.0 - left_nullfrac - left_mcv

I fixed calculation for the MCV vs histogram part. The estimates of

inner join are very close to the actual rows with statistics_target = 1000.

I think the calculation should be right.

Attachment | Content-Type | Size |
---|---|---|

inet-selfuncs-v10.patch | text/plain | 27.0 KB |

- Re: Selectivity estimation for inet operators at 2014-08-30 19:31:11 from Tom Lane

From | Date | Subject | |
---|---|---|---|

Next Message | Emre Hasegeli | 2014-08-31 18:00:30 | Re: Selectivity estimation for inet operators |

Previous Message | Emre Hasegeli | 2014-08-31 17:57:10 | Re: Selectivity estimation for inet operators |