Re: BUG #15007: LIMIT not respected in sub-queries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: will(at)summercat(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #15007: LIMIT not respected in sub-queries
Date: 2018-01-17 23:08:31
Message-ID: f74052d0-828c-428d-efe6-0d32d029fa21@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 01/11/2018 10:16 PM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15007
> Logged by: Will Storey
> Email address: will(at)summercat(dot)com
> PostgreSQL version: 10.1
> Operating system: Ubuntu 16.04
> Description:
>
> Hello,
>
> I am not sure this is a bug. But it is surprising to me and seems to
> contradict the documentation in terms of join nesting.
>
> I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT
> clause. I've found that sometimes I receive more rows (at most one extra in
> my testing) than the LIMIT, where I expected only as many rows as the LIMIT.
> This depends on the query plan. With some plans it never happens, and with
> others it happens frequently.
>
> In looking into this behaviour, I came across hints that this is a known
> quirk. I found bug reports related specifically to UPDATE/DELETE that sound
> similar to this, but no mention that the behaviour can happen with SELECT:
>
> https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1
> (note the comments on the accepted answer)
> https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com
> (and the thread)
> https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com
>
> This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL
> repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL
> 9.6.5 (where I initially encountered the behaviour).
>

I think you're right those issues are related - a change of plan may
affect the results in a somewhat non-deterministic way. In the cases you
linked it's UPDATE on additional rows, in your case it's more about
random() during rescans. I'll get to that in a minute.

> Unfortunately my test case is not very clean and it is somewhat long, so
> I've put it in a gist on GitHub:
>
> https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0
>
> The SELECT query (run by the Perl program) quickly prints out that it
> receives 6 rows.
>
> As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has
> loops > 1. I believe this is the cause of the behaviour. If I tweak the test
> to have a plan where that node runs before the Seq Scan, there are never
> more than 5 rows.
>

I went through the test case, and I think I know what's going on. The
script did not reproduce the issue for me, but I think I've been able to
tweak the query to use a plan producing more issues.

The key is to force a particular join order, and disable operations that
would materialize intermediate results of the t1 scan. That is, we want
something like this:

-> nestloop
-> nestloop t2 + t3
-> seq scan t1

Firstly, I've replaced the dynamic random() condition with a static one:

random() <= 0.5

this is not strictly necessary, but it simplifies the plan. We need the
random() call though, as I'll explain later.

Then I've disabled a bunch of plan nodes that would materialize results
of the t1 seq scan:

set enable_material = off;
set enable_sort = off;
set enable_hashjoin = off;

And finally, I've disabled join reordering by setting

set join_collapse_limit = 1;

Now, if you rewrite the query like this (which essentially just forces a
particular join order, when combined with join_collapse_limit=1, nothing
else):

explain analyze SELECT * FROM
(t3 JOIN t2 USING (t2_id)) JOIN
(SELECT * FROM t1
WHERE t1_id IS NOT NULL AND
t1_id < 100 AND
t1_val LIKE 'h%' AND
random() <= 0.5
LIMIT 5
) AS t1
ON t3.t1_id = t1.t1_id
WHERE t2.t2_val LIKE 'he%';

you will get plans like this:

QUERY PLAN
------------------------------------------------------------------------
Nested Loop (cost=0.00..31.41 rows=5 width=21)
(actual time=0.052..1.825 rows=7 loops=1)
Join Filter: (t3.t1_id = t1.t1_id)
Rows Removed by Join Filter: 73
-> Nested Loop (cost=0.00..16.10 rows=16 width=14)
(actual time=0.030..0.917 rows=16 loops=1)
Join Filter: (t3.t2_id = t2.t2_id)
Rows Removed by Join Filter: 160
-> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10)
(actual time=0.015..0.041 rows=11 loops=1)
Filter: ((t2_val)::text ~~ 'he%'::text)
-> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8)
(actual time=0.003..0.037 rows=16 loops=11)
-> Limit (cost=0.00..0.84 rows=5 width=7)
(actual time=0.008..0.037 rows=5 loops=16)
-> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7)
(actual time=0.004..0.016 rows=5 loops=16)
Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND
((t1_val)::text ~~ 'h%'::text) AND
(random() <= '0.5'::double precision))
Rows Removed by Filter: 5
Planning time: 0.625 ms
Execution time: 1.911 ms
(15 rows)

This should be equivalent to the original query, and should produce the
same results (modulo random() of course).

But notice it actually does produce 7 rows!

Nested Loop (cost=0.00..31.41 rows=5 width=21)
(actual time=0.052..1.825 rows=7 loops=1)
^

The problem is that it ends up executing the sequential scan on t1
repeatedly (because it's the inner relation in a nested loop), and
because random() is volatile, the results of the scan are likely different.

Each rescan individually still respects the LIMIT, but when combined
result may be larger - there may be more unique IDs, matching additional
rows from the other tables.

>
> I believe a better way to write this query would be to use a CTE.
>

Yes, that will stabilize the output of the random() function,
eliminating the nondeterminism during rescans.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-01-18 00:21:03 Re: BUG #15006: "make check" error if current user is "user"
Previous Message Peter Geoghegan 2018-01-17 22:23:45 Re: PostgreSQL crashes with SIGSEGV