Re: Selectivity estimation for inet operators

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: Selectivity estimation for inet operators
Date: 2014-12-08 00:32:40
Message-ID: CAB7nPqR6NhWhkvknomigmzLwMj8kYPy5JZMaYN8tksqn7AJoTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 3, 2014 at 6:14 AM, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
>> Actually, there's a second large problem with this patch: blindly
>> iterating through all combinations of MCV and histogram entries makes the
>> runtime O(N^2) in the statistics target. I made up some test data (by
>> scanning my mail logs) and observed the following planning times, as
>> reported by EXPLAIN ANALYZE:
>>
>> explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
>> explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip;
>>
>> stats target eqjoinsel networkjoinsel
>>
>> 100 0.27 ms 1.85 ms
>> 1000 2.56 ms 167.2 ms
>> 10000 56.6 ms 13987.1 ms
>>
>> I don't think it's necessary for network selectivity to be quite as
>> fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
>> time for a query that might need just milliseconds to execute :-(
>>
>> It seemed to me that it might be possible to reduce the runtime by
>> exploiting knowledge about the ordering of the histograms, but
>> I don't have time to pursue that right now.
>
> I make it break the loop when we passed the last possible match. Patch
> attached. It reduces the runtime almost 50% with large histograms.
>
> We can also make it use only some elements of the MCV and histogram
> for join estimation. I have experimented with reducing the right and
> the left hand side of the lists on the previous versions. I remember
> it was better to reduce only the left hand side. I think it would be
> enough to use log(n) elements of the right hand side MCV and histogram.
> I can make the change, if you think selectivity estimation function
> is the right place for this optimization.
Marking as "Returned with feedback" as more work needs to be done.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-08 00:36:10 Re: Support UPDATE table SET(*)=...
Previous Message Michael Paquier 2014-12-08 00:26:38 Re: Add CREATE support to event triggers