Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-10 22:27:54
Message-ID: CAKJS1f_2yGfR2Q3vEYwQ2aoHhs_QDYT5mq_YUyaobnEHCCJmSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 11 Mar 2019 at 06:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On 3/9/19 7:33 PM, Dean Rasheed wrote:
> > I wonder if it's possible to write smaller, more targeted tests.
> > Currently "stats_ext" is by far the slowest test in its group, and I'm
> > not sure that some of those tests add much. It ought to be possible to
> > write a function that calls EXPLAIN and returns a query's row
> > estimate, and then you could write tests to confirm the effect of the
> > new stats by verifying the row estimates change as expected.
>
> Sure, if we can write more targeted tests, that would be good. But it's
> not quite clear to me how wrapping EXPLAIN in a function makes those
> tests any faster?

I've not looked at the tests in question, but if they're executing an
inferior plan is used when no extended stats exists, then maybe that's
why they're slow.

I think Dean might mean to create a function similar to
explain_parallel_append() in partition_prune.sql then write tests that
check the row estimate with EXPLAIN (COSTS ON) but strip out the other
costing stuff instead of validating that the poor plan was chosen.

> On 3/10/19 2:09 PM, Dean Rasheed wrote:
> > 12). bms_member_index() should surely be in bitmapset.c. It could be
> > more efficient by just traversing the bitmap words and making use of
> > bmw_popcount(). Also, its second argument should be of type 'int' for
> > consistency with other bms_* functions.
>
> Yes, moving to bitmapset.c definitely makes sense. I don't see how it
> could use bms_popcount() though.

I think it could be done by first checking if the parameter is a
member of the set, and then if so, count all the bits that come on and
before that member. You can use bmw_popcount() for whole words before
the specific member's word then just bitwise-and a bit mask of a
bitmapword that has all bits set for all bits on and before your
parameter's BITNUM(), and add the bmw_popcount of the final word
bitwise-anding the mask. bms_add_range() has some masking code you
could copy.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-10 22:42:20 Re: Add missing CREATE TABLE IF NOT EXISTS table_name AS EXECUTE query;
Previous Message Dmitry Dolgov 2019-03-10 21:38:10 Re: Segfault when restoring -Fd dump on current HEAD