Re: help understanding explain output

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Luca Ferrari <fluca1978(at)infinito(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help understanding explain output
Date: 2011-02-16 07:04:05
Message-ID: 4D5B76E5.9010205@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Luca Ferrari wrote:
> Hello,
> I've got a doubt about partial indexes and the path chosen by the optimizer.
> Consider this simple scenario:
>
> CREATE TABLE p( pk serial NOT NULL , val2 text, val1 text, b boolean, PRIMARY
> KEY (pk) );
> INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1,1000000), 'val1b',
> 'val2b', true );
> INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1000001,2000000),
> 'val1Notb', 'val2Notb', false );
> CREATE INDEX i_p_b ON p (b) WHERE b = true;
> ANALYZE p;
>
> So I create a table with 2-million rows, the first million with b = true and
> the second one with b = false.
> Now doing an explain for a query that selects only on the b attribute I got:
>
> EXPLAIN SELECT * FROM p WHERE b = false;
> QUERY PLAN
> ------------------------------------------------------------
> Seq Scan on p (cost=0.00..34706.00 rows=1000133 width=28)
> Filter: (NOT b)
>
>
> So a sequential scan. I know that the optimizer will not consider an index if
> it is not filtering, but I don't understand exactly why in this case. In fact,
> considering that the above query could remove the first half data pages (where
> b = true), and considering that:
>
> SELECT reltype, relval1, relpages, reltuples
> FROM pg_class WHERE relval1 IN ('p', 'i_p_b');
> reltype | relval1 | relpages | reltuples
> ---------+----------+----------+-----------
> 615079 | p | 14706 | 2e+06
> 0 | i_p_b | 2745 | 999867
>
> The sequential access requires 14706 pages, while using the index for filtering
> almost the half of those, we've got 2745 + 7353 = around 10000 pages.
> I've tried to change the index type to an hash, but the situation did not
> change. Even with enable_seqscan = off the above query is executed
> sequentially, but with a different initial cost:
>
>
> EXPLAIN SELECT * FROM p WHERE b = false;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Seq Scan on p (cost=10000000000.00..10000034706.00 rows=1000133 width=28)
> Filter: (NOT b)
>
>
> And here comes the second doubt: since in both cases the planner is doing a
> sequential access, why the first case has an initial cost = 0 and this one has
> a cost of 1 million?
> I'm getting lost here, I need some hint to understand what is happening.
>
> I'm running
> PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
> 20080704 (Red Hat 4.1.2-46), 64-bit
>
>
> Thanks,
> Luca
>
>
I don't see where you've told the engine anything about the "b" column?
Naturally a boolean can only have two values, but you haven't indexed on
column b, nor partitioned the table on the values in the "b" column. I
don't think it can just _know_ that the first n records have b=true.
What behaviour would you expect if you had alternated the inserts (or
just alternated the primary keys by assigning them even and odd values
in the inserts)?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2011-02-16 07:46:44 Re: PostgresSQL 9.0 64 bit on windows 2008 64bit
Previous Message AI Rumman 2011-02-16 06:36:36 question regarding full_page_write