Re: Selectivity estimation for inet operators

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: Selectivity estimation for inet operators
Date: 2014-08-30 19:31:11
Message-ID: 9199.1409427071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> * inet_mcv_join_selec() is O(n^2) where n is the number of entries in
> the MCV lists. With the max statistics target of 10000, a worst case
> query on my laptop took about 15 seconds to plan. Maybe that's
> acceptable, but you went through some trouble to make planning of MCV vs
> histogram faster, by the log2 method to compare only some values, so I
> wonder why you didn't do the same for the MCV vs MCV case?

Actually, what I think needs to be asked is the opposite question: why is
the other code ignoring some of the statistical data? If the user asked
us to collect a lot of stats detail it seems reasonable that he's
expecting us to use it to get more accurate estimates. It's for sure
not obvious why these estimators should take shortcuts that are not being
taken in the much-longer-established code for scalar comparison estimates.

I'm not exactly convinced that the math adds up in this logic, either.
The way in which it combines results from looking at the MCV lists and
at the histograms seems pretty arbitrary.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-08-30 22:32:07 Re: What in the world is happening with castoroides and protosciurus?
Previous Message Tom Lane 2014-08-30 19:17:36 Re: Selectivity estimation for inet operators