help understanding create statistic

From: Luca Ferrari <fluca1978(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: help understanding create statistic
Date: 2018-06-28 09:38:51
Message-ID: CAKoxK+6C8CKdbYbbyNeYnc5aiDk=G-k-iDyDZMcmjJATqkLM9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,
in order to better understand this feature of 10, I've created a table
like this:

CREATE TABLE expenses(
...
day date,
year int,
CHECK( year = EXTRACT( year FROM day ) )
);

so that I can ensure 'year' and 'day' are tied together:

SELECT
count(*) FILTER( WHERE year = 2016 ) AS by_year,
count(*) FILTER( WHERE EXTRACT( year FROM day ) = 2016 ) AS by_day
FROM expenses;
-[ RECORD 1 ]-
by_year | 8784
by_day | 8784

Then I created a statistic:

CREATE STATISTICS stat_day_year ( dependencies )
ON day, year
FROM expenses;

select * from pg_statistic_ext ;
-[ RECORD 1 ]---+---------------------
stxrelid | 42833
stxname | stat_day_year
stxnamespace | 2200
stxowner | 16384
stxkeys | 3 5
stxkind | {f}
stxndistinct |
stxdependencies | {"3 => 5": 1.000000}

Now, having an index on the extract year of day as follows:

CREATE INDEX idx_year
ON expenses ( EXTRACT( year FROM day ) );

why is the planner not choosing to use such index on a 'year' raw query?

EXPLAIN SELECT * FROM expenses
WHERE year = 2016;
QUERY PLAN
-------------------------------------------------------------------------------
Gather (cost=1000.00..92782.34 rows=8465 width=32)
Workers Planned: 2
-> Parallel Seq Scan on expenses (cost=0.00..90935.84 rows=3527 width=32)
Filter: (year = 2016)

The number of rows are correct, but I was expecting it to use the same
index as a query like "WHERE EXTRACT( year FROM day) = 2016" triggers.

Even altering the year column to not null does show any change, and
this is the plan obtained turning off seq_scan (to see the costs):

EXPLAIN ANALYZE SELECT * FROM expenses
WHERE year = 2016;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on expenses (cost=10000000000.00..10000127402.44 rows=8451
width=32) (actual time=972.734..2189.300 rows=8784 loops=1)
Filter: (year = 2016)
Rows Removed by Filter: 4991216

Am I misunderstaing this functional dependency?

Thanks,
Luca

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mathieu PUJOL 2018-06-28 09:45:26 Analyze plan of foreign data wrapper
Previous Message Pablo Hendrickx 2018-06-28 07:45:30 Re: Example setup for Odyssey connection pooler?