Re: Bitmap scan is undercosted? - boolean correlation

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted? - boolean correlation
Date: 2017-12-03 04:04:30
Message-ID: 20171203040430.GC14008@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> I think the non-extended stats code also has trouble with booleans.
> pg_stats gives me a correlation of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals | {f,t}
most_common_freqs | {0.9014,0.0986}
correlation | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of the x
list (logical value) are the same value ('t'), and the CTIDs are in order on
the new index, so 90% of the values are 100% correlated.

It improves (by which I mean here that it spits out a lower number) if it's not
a 90/10 split:

CREATE TABLE aaa5 AS SELECT (id%100)::int num, (id%10>5)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX ON aaa5 (flag);

tablename | aaa5
attname | flag
correlation | 0.522184

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-12-03 06:21:56 pg_dumpall -r -c try to drop user postgres
Previous Message chenhj 2017-12-03 03:27:57 Re:Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

Browse pgsql-performance by date

  From Date Subject
Next Message Vitaliy Garnashevich 2017-12-03 21:15:01 Re: Bitmap scan is undercosted?
Previous Message Jeff Janes 2017-12-03 01:27:51 Re: Bitmap scan is undercosted?