Re: Bitmap scan is undercosted? - boolean correlation

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted? - boolean correlation
Date: 2017-12-04 00:27:23
Message-ID: CAMkU=1yCX_WK0KYUOhRSKG2a771kiK+QWU5YXxt3EPhEXCLDQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Dec 3, 2017 15:31, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com>
wrote:
>> It thinks there's somewhat-high correlation since it gets a list of x
>> and y values (integer positions by logical and physical sort order) and
>> 90% of the x list (logical value) are the same value ('t'), and the
>> CTIDs are in order on the new index, so 90% of the values are 100%
>> correlated.

> But there is no index involved (except in the case of the functional
> index). The correlation of table columns to physical order of the table
> doesn't depend on the existence of an index, or the physical order within
> an index.

> But I do see that ties within the logical order of the column values are
> broken to agree with the physical order. That is wrong, right? Is there
> any argument that this is desirable?

Uh ... what do you propose doing instead? We'd have to do something with
ties, and it's not so obvious this way is wrong.

Let them be tied. If there are 10 distinct values, number the values 0 to
9, and all rows of a given distinct values get the same number for the
logical order axis.

Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to
me. Although if we switched btree to store duplicate values with tid as a
tie breaker, then maybe it wouldn't be as obviously wrong.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jing Wang 2017-12-04 00:34:10 Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE
Previous Message Dmitry Dolgov 2017-12-04 00:26:22 Re: [HACKERS] [PATCH] Generic type subscripting

Browse pgsql-performance by date

  From Date Subject
Next Message Mariel Cherkassky 2017-12-05 14:03:11 vacuum after truncate
Previous Message Tom Lane 2017-12-03 23:31:40 Re: Bitmap scan is undercosted? - boolean correlation