Re: BRIN index which is much faster never chosen by planner

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Michael Lewis <mlewis(at)entrata(dot)com>
Subject: Re: BRIN index which is much faster never chosen by planner
Date: 2019-10-11 14:08:05
Message-ID: CAMa1XUhdoKchMmbeSef7HhRM44-_00eMt8LPeBuwPA4UrSNc-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> The planner might be able to get a better estimate on the number of
> matching rows if the now() - interval '10 days' expression was
> replaced with 'now'::timestamptz - interval '10 days'. However, care
> would need to be taken to ensure the plan is never prepared since
> 'now' is evaluated during parse. The same care must be taken when
> creating views, functions, stored procedures and the like.
>

You are on to something here I think with the now() function, even if above
suggestion is not exactly right as you said further down. I am finding a
hard-coded timestamp gives the right query plan. I also tested same with
even bigger window (last 16 days) and it yet still chooses the brin index.

foo_prod=# EXPLAIN
foo_prod-# SELECT
foo_prod-# category, source, MIN(rec_insert_time) OVER (partition by
source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time)
OVER (partition by source order by rec_insert_time) AS last_source_time
foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
foo_prod(# category, source(field1) AS source, rec_insert_time
foo_prod(# FROM log_table l
foo_prod(# INNER JOIN public.small_join_table filter ON filter.category =
l.category
foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC)
unique_cases;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=24436329.10..24436343.56 rows=643 width=120)
-> Sort (cost=24436329.10..24436330.70 rows=643 width=104)
Sort Key: unique_cases.source, unique_cases.rec_insert_time
-> Subquery Scan on unique_cases (cost=24436286.24..24436299.10
rows=643 width=104)
-> Unique (cost=24436286.24..24436292.67 rows=643
width=124)
-> Sort (cost=24436286.24..24436287.85 rows=643
width=124)
Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
-> Nested Loop (cost=0.00..24436256.25
rows=643 width=124)
Join Filter: ((l.category)::text =
filter.category)
-> Seq Scan on small_join_table filter
(cost=0.00..26.99 rows=1399 width=8)
-> Materialize (cost=0.00..24420487.02
rows=643 width=99)
-> Seq Scan on log_table l
(cost=0.00..24420483.80 rows=643 width=99)
Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
days'::interval)))
(13 rows)

foo_prod=# SELECT now() - interval '10 days';
?column?
-------------------------------
2019-10-01 08:20:38.115471-05
(1 row)

foo_prod=# EXPLAIN
SELECT
category, source, MIN(rec_insert_time) OVER (partition by source order by
rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19664576.17..19664590.63 rows=643 width=120)
-> Sort (cost=19664576.17..19664577.77 rows=643 width=104)
Sort Key: unique_cases.source, unique_cases.rec_insert_time
-> Subquery Scan on unique_cases (cost=19664533.31..19664546.17
rows=643 width=104)
-> Unique (cost=19664533.31..19664539.74 rows=643
width=124)
-> Sort (cost=19664533.31..19664534.92 rows=643
width=124)
Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
-> Nested Loop (cost=3181.19..19664503.32
rows=643 width=124)
-> Gather (cost=3180.91..19662574.92
rows=643 width=99)
Workers Planned: 3
-> Parallel Bitmap Heap Scan on
log_table l (cost=2180.91..19661510.62 rows=207 width=99)
Recheck Cond: (rec_insert_time
>= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)
Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name))
-> Bitmap Index Scan on
rec_insert_time_brin_1000 (cost=0.00..2180.75 rows=142602171 width=0)
Index Cond:
(rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time
zone)

Let me know if this rings any bells! I will respond to other comments with
other replies.

Thanks,
Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-11 14:08:37 Re: Issues with PAM : log that it failed, whether it actually failed or not
Previous Message Konstantin Knizhnik 2019-10-11 13:50:07 Re: [Proposal] Global temporary tables