Re: Postgres undeterministically uses a bad plan, how to convince it otherwise?

From: Kirk Wolak <wolakk(at)gmail(dot)com>
To: cen <cen(dot)is(dot)imba(at)gmail(dot)com>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres undeterministically uses a bad plan, how to convince it otherwise?
Date: 2023-03-02 23:07:45
Message-ID: CACLU5mQ674KO5zdKOrh_r28d7-zMAr3ehgCSNn1SkhjvH-8=Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Mar 2, 2023 at 8:20 AM cen <cen(dot)is(dot)imba(at)gmail(dot)com> wrote:

> On 16/02/2023 17:15, Ron wrote:
> > On 2/16/23 09:47, cen wrote:
> >> Hi,
> >>
> >> I am running the same application (identical codebase) as two
> >> separate instances to index (save) different sets of data. Both run
> >> PostgreSQL 13.
> >>
> >> The queries are the same but the content in actual databases is
> >> different. One database is around 1TB and the other around 300GB.
> >>
> >>
> >> There is a problem with a paginated select query with a join and an
> >> order. Depending on which column you order by (primary or FK) the
> >> query is either instant or takes minutes.
> >>
> >> So on one database, ordering by primary is instant but on the other
> >> it is slow and vice-versa. Switching the columns around on the slow
> >> case fixes the issue.
> >>
> >> All relavant colums are indexed.
> >
> > What does EXPLAIN ANALYZE say?
>
> I finally managed to get back to this and have a similar sample query.
> In this case, changing the WHERE clause to use the joined table column
> slows the query down.
>
> The initial case was on the ORDER BY column but the simptoms are the
> same I think.
>
> I understand that even though both colums are indexed, the indexes are
> completely different but the point is, how would one know in advance
> which one will be faster when designing the query?
>
> And as I mentioned in my initial text, the fast case of columns can
> switch around as the database grows.
>
>
> Fast case:
>
> SELECT
> t0."status",
> b1."timestamp"
> FROM
> "transactions" AS t0
> INNER JOIN
> "blocks" AS b1
> ON
> b1."number" = t0."block_number"
> WHERE
> (((t0."to_address_hash" = '\x3012c'))
> OR (t0."from_address_hash" = '\x3012c')
> OR (t0."created_contract_address_hash" = '\x3012c'))
> AND (t0."block_number" >= 30926000)
> AND (t0."block_number" <= 31957494)
> ORDER BY
> t0."block_number" DESC
> LIMIT
> 150
> OFFSET
> 300;
>
> Plan:
>
> Limit (cost=15911.73..23367.09 rows=150 width=16) (actual
> time=205.093..305.423 rows=150 loops=1)
> -> Gather Merge (cost=1001.03..812143.50 rows=16320 width=16)
> (actual time=36.140..305.333 rows=450 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Nested Loop (cost=1.00..809259.75 rows=6800 width=16)
> (actual time=2.662..155.779 rows=153 loops=3)
> -> Parallel Index Scan Backward using
> transactions_block_number_index on transactions t0 (cost=0.56..753566.08
> rows=6800 width=8) (actual time=0.224..145.998 rows=153 loops=3)
> Index Cond: ((block_number >= 30926000) AND
> (block_number <= 31957494))
> Filter: ((to_address_hash = '\x3012c'::bytea) OR
> (from_address_hash = '\x3012c'::bytea) OR (created_contract_address_hash
> = '\x3012c'::bytea))
> Rows Removed by Filter: 22471
> -> Index Scan using blocks_number_index on blocks b1
> (cost=0.44..8.18 rows=1 width=16) (actual time=0.059..0.060 rows=1
> loops=460)
> Index Cond: (number = t0.block_number)
> Planning Time: 0.513 ms
> Execution Time: 305.541 ms
>
> ------------------------------
>
> Slow case:
>
> SELECT
> t0."status",
> b1."timestamp"
> FROM
> "transactions" AS t0
> INNER JOIN
> "blocks" AS b1
> ON
> b1."number" = t0."block_number"
> WHERE
> (((t0."to_address_hash" = '\x3012c'))
> OR (t0."from_address_hash" = '\x3012c')
> OR (t0."created_contract_address_hash" = '\x3012c'))
> AND (b1."number" >= 30926000) -- using col from joined table instead
> AND (b1."number" <= 31957494) -- using col from joined table instead
> ORDER BY
> t0."block_number" DESC
> LIMIT
> 150
> OFFSET
> 300;
>
> Plan:
>
> Limit (cost=1867319.63..1877754.02 rows=150 width=16) (actual
> time=95830.704..95962.116 rows=150 loops=1)
> -> Gather Merge (cost=1846450.83..2015348.94 rows=2428 width=16)
> (actual time=95805.872..95962.075 rows=450 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Merge Join (cost=1845450.81..2014068.67 rows=1012
> width=16) (actual time=95791.362..95824.633 rows=159 loops=3)
> Merge Cond: (t0.block_number = b1.number)
> -> Sort (cost=1845402.63..1845823.81 rows=168474
> width=8) (actual time=95790.194..95790.270 rows=186 loops=3)
> Sort Key: t0.block_number DESC
> Sort Method: external merge Disk: 2496kB
> Worker 0: Sort Method: external merge Disk: 2408kB
> Worker 1: Sort Method: external merge Disk: 2424kB
> -> Parallel Bitmap Heap Scan on transactions t0
> (cost=39601.64..1828470.76 rows=168474 width=8) (actual
> time=7274.149..95431.494 rows=137658 loops=3)
> Recheck Cond: ((to_address_hash =
> '\x3012c'::bytea) OR (from_address_hash = '\x3012c'::bytea) OR
> (created_contract_address_hash = '\x3012c'::bytea))
> Rows Removed by Index Recheck: 716205
> Heap Blocks: exact=7043 lossy=134340
> -> BitmapOr (cost=39601.64..39601.64
> rows=404359 width=0) (actual time=7264.127..7264.130 rows=0 loops=1)
> -> Bitmap Index Scan on
> transactions_to_address_hash_recent_collated_index (cost=0.00..326.32
> rows=3434 width=0) (actual time=2.314..2.314 rows=5 loops=1)
> Index Cond: (to_address_hash =
> '\x3012c'::bytea)
> -> Bitmap Index Scan on
> transactions_from_address_hash_recent_collated_index
> (cost=0.00..38967.50 rows=400924 width=0) (actual
> time=7261.750..7261.750 rows=419509 loops=1)
> Index Cond: (from_address_hash =
> '\x3012c'::bytea)
> -> Bitmap Index Scan on
> transactions_created_contract_address_hash_recent_collated_inde
> (cost=0.00..4.57 rows=1 width=0) (actual time=0.059..0.060 rows=0 loops=1)
> Index Cond:
> (created_contract_address_hash = '\x3012c'::bytea)
> -> Index Scan Backward using blocks_number_index on
> blocks b1 (cost=0.44..167340.18 rows=189631 width=16) (actual
> time=0.139..26.964 rows=27612 loops=3)
> Index Cond: ((number >= 30926000) AND (number <=
> 31957494))
> Planning Time: 0.736 ms
> Execution Time: 95963.436 ms
>
>
> > Since you've run ANALYZE, when were the tables last vacuumed?
>
> Last autovacuum for transactions: 2023-02-23
> Last autovacuum for blocks: 2023-01-23
>
> I shoved your plans into depesz... (https://explain.depesz.com/s/hRaZ)
I made the slow one the primary, and the fast one the optimized.

This allows me to compare them side by side easier than the email.

And this makes it clear, in one line:

Sort (cost=1845402.63..1845823.81 rows=*168474*width=8) (cost=0..0 rows=0
width=0) (actual time=95,790.194..95,790.270* rows=186 *loops=3)

The estimation was for 168,474 rows. And the actual was 186. you spun
over a TON of data
that you did not use. And notice the TIME. That's your query time...

Had you included BUFFERS, I am sure it was a massive difference in data
read.

You're wondering, if the 2 columns are being equated, what difference
should it make?
But the system has to choose, and that extra information might have tricked
the optimizer into thinking, Oh, I can just scan.
Or, oh, I will use the other index as a driver, since I have a filter to
apply.

Furthermore, I RECENTLY LEARNED this: be very careful optimizing queries
with LIMIT in them.
This can cause the optimizer to think brute force is fine, since it only
needs X rows...

Kirk

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2023-03-03 03:38:41 Re: Postgres undeterministically uses a bad plan, how to convince it otherwise?
Previous Message Thorsten Glaser 2023-03-02 23:02:06 Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y