Re: Use of index for 50% column restriction

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Use of index for 50% column restriction
Date: 2016-06-08 21:36:08
Message-ID: 20160608213608.GA24608@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 8, 2016 at 05:07:34PM -0400, Bruce Momjian wrote:
> > For randomly-ordered data I believe the cutover is actually well below 10%.
>
> Ah, I had not considered the correlation order of the rows in the table.
> This test returns the sequential scan I expected by using floor(random()
> * 2):
>
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (c1 INT, c2 INT, c3 INT);
> INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
> INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
> CREATE INDEX i_test_c2 ON test (c2);
> ANALYZE test;
> EXPLAIN SELECT * FROM test WHERE c2 = 0;
>
> Thanks.

Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:

DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;

SELECT c2, COUNT(*) FROM test GROUP BY c2 ORDER BY 1;
c2 | count
----+-------
0 | 5020 25%
1 | 10006 50%
2 | 4974 25%

EXPLAIN SELECT * FROM TEST WHERE c2 = 1;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on test (cost=0.00..359.00 rows=10006 width=12)
Filter: (c2 = 1)

EXPLAIN SELECT * FROM TEST WHERE c2 = 0;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=99.19..270.94 rows=5020 width=12)
Recheck Cond: (c2 = 0)
-> Bitmap Index Scan on i_test_c2 (cost=0.00..97.94 rows=5020 width=0)
Index Cond: (c2 = 0)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-08 21:44:11 Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Previous Message Bruce Momjian 2016-06-08 21:07:34 Re: Use of index for 50% column restriction