Re: Can JoinFilter condition be pushed down into IndexScan?

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Bəxtiyar Neyman <bakhtiyarneyman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Can JoinFilter condition be pushed down into IndexScan?
Date: 2023-06-21 13:28:11
Message-ID: ae563c18-30cd-0eee-85b5-87d0aed1d312@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 6/21/23 05:37, Bəxtiyar Neyman wrote:
> I define a table user_ranks as such:
>
> CREATE TABLE user_ranks (
>   id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
>   rank INTEGER NOT NULL,
>   CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
> );
>
> INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 10000),
> generate_series(1, 10000);
>

This doesn't work, the INSERT needs to only insert into (rank).

> Here's a query I'd like to optimize:
>
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   LATERAL (
>     SELECT
>       t4_0."rank" AS "rank"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>   ) AS t3_1
>   INNER JOIN user_ranks AS t3_0 ON true
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
>

Not sure why you make the query unnecessarily complicated - the LATERAL
is pointless I believe, the "AND true" just make it harder to read.
Let's rewrite it it like this to make discussion easier:

explain (analyze,verbose)
SELECT
t3_0."id" AS "id",
t3_0."rank" AS "rank"
FROM
user_ranks AS t3_1
INNER JOIN user_ranks AS t3_0
ON ((t3_0."rank", t3_0."id") <= (t3_1."rank", t3_1."id"))
WHERE
t3_1."id" = 4732455
ORDER BY
t3_0."rank" DESC,
t3_0."id" DESC
LIMIT
10

Same query, but perhaps easier to read.

> It compiles to the following plan:
>
>  Limit  (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    ->  Nested Loop  (cost=0.56..41763.27 rows=1668 width=12) (actual
> time=8.075..8.076 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Inner Unique: true
>          Join Filter: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW(t4_0.rank, 4732455))
>          Rows Removed by Join Filter: 5002
>          ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..163.33 rows=5003 width=12) (actual
> time=0.023..0.638 rows=5003 loops=1)
>                Output: t3_0.rank, t3_0.id <http://t3_0.id>
>                Heap Fetches: 0
>          ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1
> loops=5003)
>                Output: t4_0.id <http://t4_0.id>, t4_0.rating, t4_0.rank
>                Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
>
> As you can see, there are a lot of rows returned by t3_0, which are then
> filtered by Join Filter. But it would have been better if instead of the
> filter, the  t3_0 table would have an Index Cond. Similar to how it
> happens when a correlated subquery is used (or a CTE)
>
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   user_ranks AS t3_0
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (
>     SELECT
>       t4_0."rank" AS "rank",
>       t4_0."id" AS "id"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>     ))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
>
>  Limit  (cost=8.58..8.95 rows=10 width=12) (actual time=0.062..0.064
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    InitPlan 1 (returns $0,$1)
>      ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
>            Output: t4_0.rank, t4_0.id <http://t4_0.id>
>            Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
>    ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..61.47 rows=1668 width=12) (actual
> time=0.061..0.062 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Index Cond: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW($0, $1))
>          Heap Fetches: 0
>
>
> I'm an opposite of a PostgreSQL expert, but it was surprising to me to
> see that a correlated subquery behaves better than a join. Is this
> normal? Is it something worth fixing/easy to fix?
>

Because those queries are not doing the same thing. In the first query
you sort by t3_0 columns, while the "id = 4732455" condition is on the
other table. And so it can't use the index scan for sorting.

While in the second query it can do that, and it doesn't need to do the
explicit sort (which needs to fetch all the rows etc.). If you alter the
first query to do

ORDER BY
t3_1."rank" DESC,
t3_1."id" DESC

it'll use the same plan as the second query. Well, not exactly the same,
but much closer to it.

Nevertheless, these example queries have other estimation issues, which
might result in poor plan choices. There's no row for (id = 4732455),
and the cross-table inequality estimate is just some default estimate
(33%). In reality, this produces no rows.

Secondly, for LIMIT, the cost is assumed to be "proportional" fraction
of the input costs. In other words, we expect the limit to terminate
after only seeing a fraction of rows - if we expect to see 10000 rows
and the query has LIMIT 10, we expect to only do 1/1000 of the work. But
if the subtree does not produce 10000 rows, that goes out of the window
and we may need to do much more work.

I'm not sure why the two queries actually use different plans even after
the ORDER BY change. I would have expected the second query (with
correlated subquery) to be transformed to a join, but perhaps that
transformation would be invalid, or maybe the planner does that based on
cost (and then it's not surprising due to the estimation issues).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2023-06-21 13:28:38 EBCDIC sorting as a use case for ICU rules
Previous Message Jelte Fennema 2023-06-21 13:08:19 Re: Support logical replication of DDLs