Can JoinFilter condition be pushed down into IndexScan?

From: Bəxtiyar Neyman <bakhtiyarneyman(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Can JoinFilter condition be pushed down into IndexScan?
Date: 2023-06-21 03:37:00
Message-ID: CAObnsGr0UX2zWrBeY6hsV2eP2c6drUkTL4uAUzj8+mHwVu0w3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I define a table user_ranks as such:

CREATE TABLE user_ranks (
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
rank INTEGER NOT NULL,
CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
);

INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 10000),
generate_series(1, 10000);

Here's a query I'd like to optimize:

explain (analyze,verbose)
SELECT
t3_0."id" AS "id",
t3_0."rank" AS "rank"
FROM
LATERAL (
SELECT
t4_0."rank" AS "rank"
FROM
user_ranks AS t4_0
WHERE
(t4_0."id" = 4732455)
) AS t3_1
INNER JOIN user_ranks AS t3_0 ON true
WHERE
(
((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
AND true
)
ORDER BY
t3_0."rank" DESC,
t3_0."id" DESC
LIMIT
10

It compiles to the following plan:

Limit (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078
rows=1 loops=1)
Output: t3_0.id, t3_0.rank
-> Nested Loop (cost=0.56..41763.27 rows=1668 width=12) (actual
time=8.075..8.076 rows=1 loops=1)
Output: t3_0.id, t3_0.rank
Inner Unique: true
Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, 4732455))
Rows Removed by Join Filter: 5002
-> Index Only Scan Backward using "by (rank,id)" on
public.user_ranks t3_0 (cost=0.28..163.33 rows=5003 width=12) (actual
time=0.023..0.638 rows=5003 loops=1)
Output: t3_0.rank, t3_0.id
Heap Fetches: 0
-> Index Scan using "by id" on public.user_ranks t4_0
(cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1
loops=5003)
Output: t4_0.id, t4_0.rating, t4_0.rank
Index Cond: (t4_0.id = 4732455)

As you can see, there are a lot of rows returned by t3_0, which are then
filtered by Join Filter. But it would have been better if instead of the
filter, the t3_0 table would have an Index Cond. Similar to how it happens
when a correlated subquery is used (or a CTE)

explain (analyze,verbose)
SELECT
t3_0."id" AS "id",
t3_0."rank" AS "rank"
FROM
user_ranks AS t3_0
WHERE
(
((t3_0."rank", t3_0."id") <= (
SELECT
t4_0."rank" AS "rank",
t4_0."id" AS "id"
FROM
user_ranks AS t4_0
WHERE
(t4_0."id" = 4732455)
))
AND true
)
ORDER BY
t3_0."rank" DESC,
t3_0."id" DESC
LIMIT
10

Limit (cost=8.58..8.95 rows=10 width=12) (actual time=0.062..0.064 rows=1
loops=1)
Output: t3_0.id, t3_0.rank
InitPlan 1 (returns $0,$1)
-> Index Scan using "by id" on public.user_ranks t4_0
(cost=0.28..8.30 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
Output: t4_0.rank, t4_0.id
Index Cond: (t4_0.id = 4732455)
-> Index Only Scan Backward using "by (rank,id)" on public.user_ranks
t3_0 (cost=0.28..61.47 rows=1668 width=12) (actual time=0.061..0.062
rows=1 loops=1)
Output: t3_0.id, t3_0.rank
Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
Heap Fetches: 0

I'm an opposite of a PostgreSQL expert, but it was surprising to me to see
that a correlated subquery behaves better than a join. Is this normal? Is
it something worth fixing/easy to fix?

Sincerely,
Bakhtiyar

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-06-21 03:39:31 Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
Previous Message Amit Kapila 2023-06-21 03:24:48 Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call