Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-11 02:59:33
Message-ID: d9bd905d-8b67-8eb7-9b38-11ba8345206b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/10/19 11:27 PM, David Rowley wrote:
> On Mon, 11 Mar 2019 at 06:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On 3/9/19 7:33 PM, Dean Rasheed wrote:
>>> I wonder if it's possible to write smaller, more targeted tests.
>>> Currently "stats_ext" is by far the slowest test in its group, and I'm
>>> not sure that some of those tests add much. It ought to be possible to
>>> write a function that calls EXPLAIN and returns a query's row
>>> estimate, and then you could write tests to confirm the effect of the
>>> new stats by verifying the row estimates change as expected.
>>
>> Sure, if we can write more targeted tests, that would be good. But it's
>> not quite clear to me how wrapping EXPLAIN in a function makes those
>> tests any faster?
>
> I've not looked at the tests in question, but if they're executing an
> inferior plan is used when no extended stats exists, then maybe that's
> why they're slow.
>

I don't think the tests are executing any queries - the tests merely
generate execution plans, without executing them.

> I think Dean might mean to create a function similar to
> explain_parallel_append() in partition_prune.sql then write tests that
> check the row estimate with EXPLAIN (COSTS ON) but strip out the other
> costing stuff instead of validating that the poor plan was chosen.
>

I'm not opposed to doing that, of course. I'm just not sure it's a way
to make the tests faster. Will investigate.

>> On 3/10/19 2:09 PM, Dean Rasheed wrote:
>>> 12). bms_member_index() should surely be in bitmapset.c. It could be
>>> more efficient by just traversing the bitmap words and making use of
>>> bmw_popcount(). Also, its second argument should be of type 'int' for
>>> consistency with other bms_* functions.
>>
>> Yes, moving to bitmapset.c definitely makes sense. I don't see how it
>> could use bms_popcount() though.
>
> I think it could be done by first checking if the parameter is a
> member of the set, and then if so, count all the bits that come on and
> before that member. You can use bmw_popcount() for whole words before
> the specific member's word then just bitwise-and a bit mask of a
> bitmapword that has all bits set for all bits on and before your
> parameter's BITNUM(), and add the bmw_popcount of the final word
> bitwise-anding the mask. bms_add_range() has some masking code you
> could copy.
>

Ah, right - that would work.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-11 03:15:40 Portability of strtod (was Re: pgsql: Include GUC's unit, if it has one, in out-of-range error message)
Previous Message Ashutosh Bapat 2019-03-11 02:58:54 Re: [HACKERS] advanced partition matching algorithm for partition-wise join