Re: Nested loop does not preserve order. Why?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexey Bashtanov <bashtanov(at)imap(dot)cc>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Nested loop does not preserve order. Why?
Date: 2014-02-11 15:36:11
Message-ID: 19836.1392132971@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alexey Bashtanov <bashtanov(at)imap(dot)cc> writes:
> Let us have a table T(A, B), an index on T(B, A) and a much smaller
> table V(B).
> And we are looking for all rows of T that have B present in V and sorted
> by B, A.
> The most natural way to obtain them would be to take all records from V
> ordered by B, and to perform an index-only scan on T for each B.
> But not for postgresql :(

AFAICS, what you're wishing is that given a plan like

nestloop
Join condition: x = y
-> path ordered by x
-> path ordered by y, z

the planner would realize that the output of the nestloop is ordered
by y,z. Right now it only realizes that the output is ordered by x
(and hence also by y); this means it thinks it needs an additional
sort step.

As stated, though, this deduction about ordering is false. To make it
true, you need the additional assumption that the outer relation is
unique in x (plus assumptions about the equality operators in question
all being compatible with the sort ordering).

Right now, the planner doesn't really track uniqueness so there's
no practical way to do this. There's a patch in the current commitfest
that wants to add tracking of unique sort orders, so if that gets in
we might have appropriate infrastructure. But even then, this seems
like a mighty narrow and unusual corner case; I can't recall ever
seeing a request for such behavior before. So I doubt we'd do it
unless the deduction turns out to be basically free, which seems
unlikely.

Having said that, there does seem to be something fishy going on
here. If I add an index on v.b to your example, and drop the
explicit request to sort on a, I can get this:

EXPLAIN select b, a from u where
b in (select b from v) order by b;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=13802.17..14052.17 rows=100000 width=8)
Sort Key: u.b
-> Nested Loop (cost=8.89..4128.85 rows=100000 width=8)
-> Unique (cost=8.45..8.50 rows=10 width=4)
-> Sort (cost=8.45..8.48 rows=10 width=4)
Sort Key: v.b
-> Index Only Scan using v_b_idx on v (cost=0.14..8.29 rows=10 width=4)
-> Index Only Scan using u_i on u (cost=0.43..312.04 rows=10000 width=8)
Index Cond: (b = v.b)
Planning time: 0.481 ms
(10 rows)

I'd have expected the planner to realize that neither of
those sort steps are necessary. Will investigate.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2014-02-11 15:45:37 Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd
Previous Message mduyunov 2014-02-11 14:32:10 BUG #9186: CONTEXT log line still appears when turned off