Skip site navigation (1) Skip section navigation (2)

Odd number of rows expected

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Odd number of rows expected
Date: 2005-01-21 20:38:27
Message-ID: 20050121203827.GW67721@decibel.org (view raw or flat)
Thread:
Lists: pgsql-performance
I have a query that thinks it's going to generate a huge number of rows,
when in fact it won't:

INSERT INTO page_log.rrs
        ( bucket_id, page_id,project_id,other, hits,min_hits,max_hits,total_duration,min_duration,max_duration )
    SELECT a.rrs_bucket_id, page_id,project_id,other
                , count(*),count(*),count(*),sum(duration),min(duration),max(duration)
        FROM
            (SELECT b.bucket_id AS rrs_bucket_id, s.*
                FROM rrs.bucket b
                    JOIN page_log.log s
                        ON (
                            b.prev_end_time  < log_time
                            AND b.end_time >= log_time )
                WHERE b.rrs_id = '1'
                    AND b.end_time <= '2005-01-21 20:23:00+00'
                    AND b.end_time > '1970-01-01 00:00:00+00'
            ) a
        GROUP BY rrs_bucket_id, page_id,project_id,other;

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan "*SELECT*"  (cost=170461360504.98..183419912556.69 rows=91175544 width=77)
   ->  GroupAggregate  (cost=170461360504.98..183418316984.67 rows=91175544 width=29)
         ->  Sort  (cost=170461360504.98..171639141309.21 rows=471112321692 width=29)
               Sort Key: b.bucket_id, s.page_id, s.project_id, s.other
               ->  Nested Loop  (cost=0.00..17287707964.10 rows=471112321692 width=29)
                     ->  Seq Scan on bucket b  (cost=0.00..9275.84 rows=281406 width=20)
                           Filter: ((rrs_id = 1) AND (end_time <= '2005-01-21 20:23:00+00'::timestamp with time zone) AND (end_time > '1970-01-01 00:00:00+00'::timestamp with time zone))
                     ->  Index Scan using log__log_time on log s  (cost=0.00..36321.24 rows=1674137 width=33)
                           Index Cond: (("outer".prev_end_time < s.log_time) AND ("outer".end_time >= s.log_time))

The final rowcount after the aggregate will actually be about 1.9M
rows:
                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan "*SELECT*"  (cost=170461360504.98..183419912556.69 rows=91175544 width=77) (actual time=156777.374..234613.843 rows=1945123 loops=1)
   ->  GroupAggregate  (cost=170461360504.98..183418316984.67 rows=91175544 width=29) (actual time=156777.345..214246.751 rows=1945123 loops=1)
         ->  Sort  (cost=170461360504.98..171639141309.21 rows=471112321692 width=29) (actual time=156777.296..177517.663 rows=4915567 loops=1)
               Sort Key: b.bucket_id, s.page_id, s.project_id, s.other
               ->  Nested Loop  (cost=0.00..17287707964.10 rows=471112321692 width=29) (actual time=0.662..90702.755 rows=4915567 loops=1)
                     ->  Seq Scan on bucket b  (cost=0.00..9275.84 rows=281406 width=20) (actual time=0.063..1591.591 rows=265122 loops=1)
                           Filter: ((rrs_id = 1) AND (end_time <= '2005-01-21 20:23:00+00'::timestamp with time zone) AND (end_time > '1970-01-01 00:00:00+00'::timestamp with time zone))
                     ->  Index Scan using log__log_time on log s  (cost=0.00..36321.24 rows=1674137 width=33) (actual time=0.014..0.174 rows=19 loops=265122)
                           Index Cond: (("outer".prev_end_time < s.log_time) AND ("outer".end_time >= s.log_time))
 Total runtime: 299623.954 ms

Everything is analyzed, and the statistics target is set to 1000.
Basically, it seems that it doesn't understand that each row in log will
match up with at most one row in bucket. There is a unique index on
bucket(rrs_id, end_time), so it should be able to tell this.
-- 
Jim C. Nasby, Database Consultant               decibel(at)decibel(dot)org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

pgsql-performance by date

Next:From: Peter DarleyDate: 2005-01-21 22:34:40
Subject: Re: PostgreSQL clustering VS MySQL clustering
Previous:From: Marty ScholesDate: 2005-01-21 19:32:12
Subject: Re: PostgreSQL vs. Oracle vs. Microsoft

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group