Re: merge semi join cost calculation error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: merge semi join cost calculation error
Date: 2018-10-08 15:00:36
Message-ID: 27470.1539010836@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> The user sent a plan:

> QUERY PLAN
> Merge Semi Join (cost=82.97..580.24 rows=580 width=8) (actual
> time=0.503..9557.396 rows=721 loops=1)
> Merge Cond: (tips.users_id = follows.users_id_to)
> -> Index Scan using tips_idx_users_id01 on tips (cost=0.43..8378397.19
> rows=2491358 width=16) (actual time=0.009..9231.585 rows=2353914 loops=1)
> -> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.052..0.089
> rows=28 loops=1)
> Sort Key: follows.users_id_to
> Sort Method: quicksort Memory: 26kB
> -> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual
> time=0.013..0.020 rows=28 loops=1)
> Filter: (users_id_from = 1)

> He has PostgreSQL 10.5. I cannot to understand to too low total cost of Merge
> Semi Join because subnode has very high cost 8378397.

The planner seems to be supposing that the merge will stop far short of
scanning the entire LHS table, presumably as a result of thinking that
the maximum value of follows.users_id_to is much less than the maximum
value of tips.users_id. Given the actual rowcounts, that's seemingly
not true, which suggests out-of-date stats for one table or the other.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-08 15:03:56 Re: PostgreSQL 12, JIT defaults
Previous Message Andres Freund 2018-10-08 14:58:12 Re: PostgreSQL 12, JIT defaults