Re: LIMIT and UNION ALL

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: LIMIT and UNION ALL
Date: 2011-05-26 16:21:45
Message-ID: BANLkTiki32HzEYh8f36p8MB5K3jTSnf5BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 18, 2011 at 8:54 AM, Robert Klemme
<shortcutter(at)googlemail(dot)com>wrote:

> On Wed, May 18, 2011 at 5:26 PM, Dave Johansen <davejohansen(at)gmail(dot)com>
> wrote:
> > I am using Postgres 8.3.3 and I have a VIEW which is a UNION ALL of two
> > tables but when I do a select on the view using a LIMIT, it scans the
> entire
> > tables and takes significantly longer than writing out the query with the
> > LIMITs in the sub-queries themselves. Is there a solution to get the view
> to
> > perform like the query with the LIMIT explicitly placed in the
> sub-queries?
>
> Can you show DDL and queries?
>
> The query with the LIMIT on the subqueries and the one with the LIMIT
> on the overall query are not semantically equivalent. Since you can
> have an ORDER BY before the LIMIT on the query with the limit on the
> view the database must have all the rows before it can apply the
> ordering and properly determine the limit. Although it might be
> possible to determine under particular circumstances that only one of
> the tables needs to be queried or tables need only be queried
> partially I deem that quite complex. I do not know whether Postgres
> can do such optimizations but for that we would certainly need to see
> the concrete example (including constraint and indexes).
>
> Kind regards
>
> robert
>
> --
> remember.guy do |as, often| as.you_can - without end
> http://blog.rubybestpractices.com/
>

Yes, there is an order by an index involved. Here's a simplified version of
the schema and queries that demonstrates the same behaviour.

Table "public.message1"
Column | Type |
Modifiers
--------+--------------------------+--------------------------------------------------------
rid | integer | not null default
nextval('message1_rid_seq'::regclass)
data | integer |
tlocal | timestamp with time zone |
Indexes:
"message1_pkey" PRIMARY KEY, btree (rid)
Referenced by:
TABLE "parsed1" CONSTRAINT "parsed1_msgid_fkey" FOREIGN KEY (msgid)
REFERENCES message1(rid)

Table "public.parsed1"
Column | Type |
Modifiers
--------+--------------------------+--------------------------------------------------------
rid | integer | not null default
nextval('parsed1_rid_seq'::regclass)
msgid | integer |
data | integer |
tlocal | timestamp with time zone |
Indexes:
"parsed1_pkey" PRIMARY KEY, btree (rid)
Foreign-key constraints:
"parsed1_msgid_fkey" FOREIGN KEY (msgid) REFERENCES message1(rid) ON
DELETE CASCADE

For this example, message2 has the same structure/definition and message1
and parsed2 has the same structure/definition as parsed1.

View "public.parsed_all"
Column | Type | Modifiers
--------+--------------------------+-----------
rid | integer |
msgid | integer |
data | integer |
tlocal | timestamp with time zone |
View definition:
SELECT parsed1.rid, parsed1.msgid, parsed1.data, parsed1.tlocal
FROM parsed1
UNION ALL
SELECT parsed2.rid, parsed2.msgid, parsed2.data, parsed2.tlocal
FROM parsed2;

Slow version using the view:

EXPLAIN ANALYZE SELECT * FROM parsed_all ORDER BY tlocal DESC LIMIT 10;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74985.28..74985.31 rows=10 width=20) (actual
time=6224.229..6224.244 rows=10 loops=1)
-> Sort (cost=74985.28..79985.28 rows=2000000 width=20) (actual
time=6224.226..6224.230 rows=10 loops=1)
Sort Key: parsed1.tlocal
Sort Method: top-N heapsort Memory: 17kB
-> Result (cost=0.00..31766.00 rows=2000000 width=20) (actual
time=0.026..4933.210 rows=2000000 loops=1)
-> Append (cost=0.00..31766.00 rows=2000000 width=20)
(actual time=0.024..2880.868 rows=2000000 loops=1)
-> Seq Scan on parsed1 (cost=0.00..15883.00
rows=1000000 width=20) (actual time=0.023..551.870 rows=1000000 loops=1)
-> Seq Scan on parsed2 (cost=0.00..15883.00
rows=1000000 width=20) (actual time=0.027..549.465 rows=1000000 loops=1)
Total runtime: 6224.337 ms
(9 rows)

Fast version using a direct query with limits in the sub-queries:

EXPLAIN ANALYZE SELECT * FROM (SELECT * FROM (SELECT * FROM parsed1 ORDER BY
tlocal DESC LIMIT 10) AS a UNION ALL SELECT * FROM (SELECT * FROM parsed2
ORDER BY tlocal DESC LIMIT 10) AS b) AS c ORDER BY tlocal DESC LIMIT 10;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
---------------------
Limit (cost=1.33..1.35 rows=10 width=20) (actual time=0.131..0.145 rows=10
loops=1)
-> Sort (cost=1.33..1.38 rows=20 width=20) (actual time=0.129..0.132
rows=10 loops=1)
Sort Key: parsed1.tlocal
Sort Method: quicksort Memory: 17kB
-> Result (cost=0.00..0.90 rows=20 width=20) (actual
time=0.023..0.100 rows=20 loops=1)
-> Append (cost=0.00..0.90 rows=20 width=20) (actual
time=0.020..0.078 rows=20 loops=1)
-> Limit (cost=0.00..0.35 rows=10 width=20) (actual
time=0.020..0.035 rows=10 loops=1)
-> Index Scan using parsed1_tlocal_index on
parsed1 (cost=0.00..34790.39 rows=1000000 width=20) (actual time=0.018..0.
025 rows=10 loops=1)
-> Limit (cost=0.00..0.35 rows=10 width=20) (actual
time=0.010..0.024 rows=10 loops=1)
-> Index Scan using parsed2_tlocal_index on
parsed2 (cost=0.00..34758.39 rows=1000000 width=20) (actual time=0.009..0.
015 rows=10 loops=1)
Total runtime: 0.187 ms
(11 rows)

Basically, the second query is giving the same result as the version using
the view but is able to use the indexes because it the order by and limit
are explicitly placed in the sub-queries. So is there a way to make the
planner perform the same sort of operation and push those same constraints
into the sub-queries on its own?

Thanks,
Dave

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2011-05-26 16:37:56 Re: The shared buffers challenge
Previous Message panam 2011-05-26 16:08:07 Re: Hash Anti Join performance degradation