Re: Yet Another COUNT(*)...WHERE...question

From: Rainer Bauer <usenet(at)munnin(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Yet Another COUNT(*)...WHERE...question
Date: 2007-08-16 15:20:09
Message-ID: man8c3tdchp1umuf8vu5kvuujrr1vfvmkj@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gregory Stark wrote:

>"Rainer Bauer" <usenet(at)munnin(dot)com> writes:
>
>> Anyway, what Phoenix is trying to say is that 2 queries are required: One to
>> get the total count and one to get the tuples for the current page. I reckon
>> it would help, if the query returning the result set could also report the
>> total no. of tuples found. Somthing like
>> SELECT COUNT(*), * FROM <table> WHERE <cond> OFFSET <o> LIMIT <l>
>>
>> Or is there a way to do that?
>
>Well anything like the above would just report l as the count.

True, but what about this:

SELECT (SELECT COUNT(*) FROM <table> WHERE <cond>), * FROM <table> WHERE <cond> OFFSET <o> LIMIT <l>

I just tested this on a query and its about 5-10% faster than issuing both commands separately (caching effects?). I wonder whether there would be any chance that Postgres could detect that the "count" select and the "data" select result in the same query plan?

In my example (which is included below) the hash join is executed twice.

>The only way to do it in Postgres currently is to create a temporary table.
>Then you can populate it once, then select the count from the temporary table
>in one query and the required page from it in the second query.
>
>But temporary tables in Postgres are not really designed for this. In
>particular they count as DDL so you have to grant privileges to create tables
>to the application and it has to create and delete entries in pg_class for
>every use.

Well I don't think popuplating a temporary table with possible millions of rows is faster than executing the query twice. Remember that a performance problem only occurs if there are a lot of tuples returned.

Rainer

======================================================================

This is the count query:
SELECT COUNT(*) FROM "tblItem" AS i INNER JOIN
(SELECT "intItemID" FROM "tblItem2Category" WHERE "intCategoryID"=88869805) AS vrtChild ON i."intItemIDCnt"=vrtChild."intItemID"
WHERE ("intTimeEnd" < 1187273177)

This is the select analyse output:
"Aggregate (cost=356098.46..356098.47 rows=1 width=0) (actual time=29411.570..29411.570 rows=1 loops=1)"
" -> Hash Join (cost=177545.23..350137.60 rows=2384343 width=0) (actual time=17382.286..28864.851 rows=2383740 loops=1)"
" Hash Cond: ("tblItem2Category"."intItemID" = i."intItemIDCnt")"
" -> Bitmap Heap Scan on "tblItem2Category" (cost=41560.03..134660.50 rows=2561397 width=4) (actual time=1984.006..11048.762 rows=2513204 loops=1)"
" Recheck Cond: ("intCategoryID" = 88869805)"
" -> Bitmap Index Scan on ccitem2categorycategoryidix (cost=0.00..40919.69 rows=2561397 width=0) (actual time=1980.614..1980.614 rows=2513204 loops=1)"
" Index Cond: ("intCategoryID" = 88869805)"
" -> Hash (cost=95316.41..95316.41 rows=2339583 width=4) (actual time=15024.827..15024.827 rows=2383832 loops=1)"
" -> Seq Scan on "tblItem" i (cost=0.00..95316.41 rows=2339583 width=4) (actual time=8.634..13763.878 rows=2383832 loops=1)"
" Filter: ("intTimeEnd" < 1187273177)"
"Total runtime: 29411.668 ms"

======================================================================

This is the data query:
SELECT i."intItemIDCnt" FROM "tblItem" AS i INNER JOIN
(SELECT "intItemID" FROM "tblItem2Category" WHERE "intCategoryID"=88869805) AS vrtChild ON i."intItemIDCnt"=vrtChild."intItemID"
WHERE ("intTimeEnd" < 1187273177)
ORDER BY "intFlagListingFeatured" DESC, "intTimeEnd", "intItemIDCnt" OFFSET 500 LIMIT 50

This is the select analyse output:
"Limit (cost=733011.30..733011.42 rows=50 width=12) (actual time=37852.007..37852.058 rows=50 loops=1)"
" -> Sort (cost=733010.05..738970.91 rows=2384343 width=12) (actual time=37851.581..37851.947 rows=550 loops=1)"
" Sort Key: i."intFlagListingFeatured", i."intTimeEnd", i."intItemIDCnt""
" -> Hash Join (cost=179830.23..354707.60 rows=2384343 width=12) (actual time=17091.753..29040.425 rows=2383740 loops=1)"
" Hash Cond: ("tblItem2Category"."intItemID" = i."intItemIDCnt")"
" -> Bitmap Heap Scan on "tblItem2Category" (cost=41560.03..134660.50 rows=2561397 width=4) (actual time=1976.599..10970.394 rows=2513204 loops=1)"
" Recheck Cond: ("intCategoryID" = 88869805)"
" -> Bitmap Index Scan on ccitem2categorycategoryidix (cost=0.00..40919.69 rows=2561397 width=0) (actual time=1973.160..1973.160 rows=2513204 loops=1)"
" Index Cond: ("intCategoryID" = 88869805)"
" -> Hash (cost=95316.41..95316.41 rows=2339583 width=12) (actual time=14758.256..14758.256 rows=2383832 loops=1)"
" -> Seq Scan on "tblItem" i (cost=0.00..95316.41 rows=2339583 width=12) (actual time=8.592..13373.179 rows=2383832 loops=1)"
" Filter: ("intTimeEnd" < 1187273177)"
"Total runtime: 38247.533 ms"

======================================================================

This is the combined count/data query:
SELECT (SELECT COUNT(*) FROM "tblItem" AS i INNER JOIN (SELECT "intItemID" FROM "tblItem2Category" WHERE "intCategoryID"=88869805) AS vrtChild ON i."intItemIDCnt"=vrtChild."intItemID" WHERE ("intTimeEnd" < 1187273177)), i."intItemIDCnt" FROM "tblItem" AS i INNER JOIN (SELECT "intItemID" FROM "tblItem2Category" WHERE "intCategoryID"=88869805) AS vrtChild ON i."intItemIDCnt"=vrtChild."intItemID" WHERE ("intTimeEnd" < 1187273177) ORDER BY "intFlagListingFeatured" DESC, "intTimeEnd", "intItemIDCnt" OFFSET 500 LIMIT 50

"Limit (cost=1089109.77..1089109.90 rows=50 width=12) (actual time=62547.673..62547.727 rows=50 loops=1)"
" InitPlan"
" -> Aggregate (cost=356098.46..356098.47 rows=1 width=0) (actual time=16385.927..16385.927 rows=1 loops=1)"
" -> Hash Join (cost=177545.23..350137.60 rows=2384343 width=0) (actual time=4320.749..15843.759 rows=2383740 loops=1)"
" Hash Cond: ("tblItem2Category"."intItemID" = i."intItemIDCnt")"
" -> Bitmap Heap Scan on "tblItem2Category" (cost=41560.03..134660.50 rows=2561397 width=4) (actual time=293.025..9211.673 rows=2513204 loops=1)"
" Recheck Cond: ("intCategoryID" = 88869805)"
" -> Bitmap Index Scan on ccitem2categorycategoryidix (cost=0.00..40919.69 rows=2561397 width=0) (actual time=289.602..289.602 rows=2513204 loops=1)"
" Index Cond: ("intCategoryID" = 88869805)"
" -> Hash (cost=95316.41..95316.41 rows=2339583 width=4) (actual time=3461.292..3461.292 rows=2383832 loops=1)"
" -> Seq Scan on "tblItem" i (cost=0.00..95316.41 rows=2339583 width=4) (actual time=3.658..1692.979 rows=2383832 loops=1)"
" Filter: ("intTimeEnd" < 1187273177)"
" -> Sort (cost=733010.05..738970.91 rows=2384343 width=12) (actual time=62547.242..62547.616 rows=550 loops=1)"
" Sort Key: i."intFlagListingFeatured", i."intTimeEnd", i."intItemIDCnt""
" -> Hash Join (cost=179830.23..354707.60 rows=2384343 width=12) (actual time=38625.024..51768.452 rows=2383740 loops=1)"
" Hash Cond: ("tblItem2Category"."intItemID" = i."intItemIDCnt")"
" -> Bitmap Heap Scan on "tblItem2Category" (cost=41560.03..134660.50 rows=2561397 width=4) (actual time=1968.052..2893.092 rows=2513204 loops=1)"
" Recheck Cond: ("intCategoryID" = 88869805)"
" -> Bitmap Index Scan on ccitem2categorycategoryidix (cost=0.00..40919.69 rows=2561397 width=0) (actual time=1964.642..1964.642 rows=2513204 loops=1)"
" Index Cond: ("intCategoryID" = 88869805)"
" -> Hash (cost=95316.41..95316.41 rows=2339583 width=12) (actual time=19915.284..19915.284 rows=2383832 loops=1)"
" -> Seq Scan on "tblItem" i (cost=0.00..95316.41 rows=2339583 width=12) (actual time=8.622..18369.696 rows=2383832 loops=1)"
" Filter: ("intTimeEnd" < 1187273177)"
"Total runtime: 63042.165 ms"

======================================================================

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rainer Bauer 2007-08-16 15:20:13 Re: Yet Another COUNT(*)...WHERE...question
Previous Message Rainer Bauer 2007-08-16 15:20:01 Re: Yet Another COUNT(*)...WHERE...question