Re: BRIN index which is much faster never chosen by planner

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeremy Finzel <finzelj(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Michael Lewis <mlewis(at)entrata(dot)com>
Subject: Re: BRIN index which is much faster never chosen by planner
Date: 2019-10-11 18:46:20
Message-ID: 20191011184620.fopsssbxwmcouitl@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 11, 2019 at 09:08:05AM -0500, Jeremy Finzel wrote:
>On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
>wrote:
>
>> The planner might be able to get a better estimate on the number of
>> matching rows if the now() - interval '10 days' expression was
>> replaced with 'now'::timestamptz - interval '10 days'. However, care
>> would need to be taken to ensure the plan is never prepared since
>> 'now' is evaluated during parse. The same care must be taken when
>> creating views, functions, stored procedures and the like.
>>
>
>You are on to something here I think with the now() function, even if above
>suggestion is not exactly right as you said further down. I am finding a
>hard-coded timestamp gives the right query plan. I also tested same with
>even bigger window (last 16 days) and it yet still chooses the brin index.
>
>foo_prod=# EXPLAIN
>foo_prod-# SELECT
>foo_prod-# category, source, MIN(rec_insert_time) OVER (partition by
>source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time)
>OVER (partition by source order by rec_insert_time) AS last_source_time
>foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
>foo_prod(# category, source(field1) AS source, rec_insert_time
>foo_prod(# FROM log_table l
>foo_prod(# INNER JOIN public.small_join_table filter ON filter.category =
>l.category
>foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
>foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
>foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC)
>unique_cases;
>
> QUERY PLAN
>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> WindowAgg (cost=24436329.10..24436343.56 rows=643 width=120)
> -> Sort (cost=24436329.10..24436330.70 rows=643 width=104)
> Sort Key: unique_cases.source, unique_cases.rec_insert_time
> -> Subquery Scan on unique_cases (cost=24436286.24..24436299.10
>rows=643 width=104)
> -> Unique (cost=24436286.24..24436292.67 rows=643
>width=124)
> -> Sort (cost=24436286.24..24436287.85 rows=643
>width=124)
> Sort Key: l.brand_id, l.last_change, l.log_id,
>l.rec_insert_time DESC
> -> Nested Loop (cost=0.00..24436256.25
>rows=643 width=124)
> Join Filter: ((l.category)::text =
>filter.category)
> -> Seq Scan on small_join_table filter
> (cost=0.00..26.99 rows=1399 width=8)
> -> Materialize (cost=0.00..24420487.02
>rows=643 width=99)
> -> Seq Scan on log_table l
> (cost=0.00..24420483.80 rows=643 width=99)
> Filter: ((field1 IS NOT NULL)
>AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
>days'::interval)))
>(13 rows)
>
>foo_prod=# SELECT now() - interval '10 days';
> ?column?
>-------------------------------
> 2019-10-01 08:20:38.115471-05
>(1 row)
>
>foo_prod=# EXPLAIN
>SELECT
> category, source, MIN(rec_insert_time) OVER (partition by source order by
>rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
>by source order by rec_insert_time) AS last_source_time
>FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
>category, source(field1) AS source, rec_insert_time
>FROM log_table l
>INNER JOIN public.small_join_table filter ON filter.category = l.category
>WHERE field1 IS NOT NULL AND l.category = 'music'
>AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
>ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;
>
> QUERY PLAN
>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> WindowAgg (cost=19664576.17..19664590.63 rows=643 width=120)
> -> Sort (cost=19664576.17..19664577.77 rows=643 width=104)
> Sort Key: unique_cases.source, unique_cases.rec_insert_time
> -> Subquery Scan on unique_cases (cost=19664533.31..19664546.17
>rows=643 width=104)
> -> Unique (cost=19664533.31..19664539.74 rows=643
>width=124)
> -> Sort (cost=19664533.31..19664534.92 rows=643
>width=124)
> Sort Key: l.brand_id, l.last_change, l.log_id,
>l.rec_insert_time DESC
> -> Nested Loop (cost=3181.19..19664503.32
>rows=643 width=124)
> -> Gather (cost=3180.91..19662574.92
>rows=643 width=99)
> Workers Planned: 3
> -> Parallel Bitmap Heap Scan on
>log_table l (cost=2180.91..19661510.62 rows=207 width=99)
> Recheck Cond: (rec_insert_time
>>= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)
> Filter: ((field1 IS NOT NULL)
>AND (category = 'music'::name))
> -> Bitmap Index Scan on
>rec_insert_time_brin_1000 (cost=0.00..2180.75 rows=142602171 width=0)
> Index Cond:
>(rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time
>zone)
>
>
>Let me know if this rings any bells! I will respond to other comments with
>other replies.
>

My guess - it's (at least partially) due to cpu_operator_cost,
associated with the now() call. When replaced with a literal, this cost
disappears and so the total query cost decreases.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyle Bateman 2019-10-11 18:46:32 Re: Connect as multiple users using single client certificate
Previous Message Andrew Dunstan 2019-10-11 18:12:12 Re: Connect as multiple users using single client certificate