Re: Propogating conditions into a query

From: Kim Bisgaard <kib+pg(at)dmi(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Propogating conditions into a query
Date: 2005-06-23 07:35:47
Message-ID: 42BA6653.2090201@dmi.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Tom,

I have now completed the move to PG8.0.3, and feel that I have confirmed
that this problem is related to the problem I'm having:

Formulated like this, it is not performing:

SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b
USING (station_id, timeobs)
WHERE station_id = 52981
AND timeobs = '2004-1-1 0:0:0';

Merge Full Join (cost=1598312.83..11032924.48 rows=6956994 width=32) (actual time=119061.098..133314.306 rows=1 loops=1)
Merge Cond: (("outer".timeobs = "inner".timeobs) AND ("outer".station_id = "inner".station_id))
Filter: ((COALESCE("inner".station_id, "outer".station_id) = 52981) AND (COALESCE("inner".timeobs, "outer".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone))
-> Sort (cost=346429.38..352445.11 rows=2406292 width=16) (actual time=20315.241..23850.529 rows=2406292 loops=1)
Sort Key: b.timeobs, b.station_id
-> Seq Scan on temp_grass b (cost=0.00..41756.92 rows=2406292 width=16) (actual time=10.517..7003.468 rows=2406292 loops=1)
-> Sort (cost=1251883.44..1269275.93 rows=6956994 width=16) (actual time=82122.354..92027.850 rows=6956994 loops=1)
Sort Key: a.timeobs, a.station_id
-> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=23.759..39930.741 rows=6956994 loops=1)
Total runtime: 133623.422 ms

But Postgresql can do the work, if it is reformulated into:

SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
FROM
(SELECT station_id, timeobs, temp_dry_at_2m
FROM temp_dry_at_2m
WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a
FULL OUTER JOIN
(SELECT station_id, timeobs, temp_grass
FROM temp_grass
WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b
USING (station_id, timeobs)

Merge Full Join (cost=0.00..43023.64 rows=10614 width=32) (actual time=0.056..0.064 rows=1 loops=1)
-> Index Scan using temp_grass_idx on temp_grass (cost=0.00..246.55 rows=61 width=16) (actual time=0.029..0.031 rows=1 loops=1)
Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
-> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m (cost=0.00..699.52 rows=174 width=16) (actual time=0.017..0.020 rows=1 loops=1)
Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981))
Total runtime: 0.163 ms

The reason the first query is not performing is because the query
optimizer does not push the conditions down into the sub-queries - right??

The reason that I do not just use the reformulated query, is that e.g.
the station_id comes from another table (and there can be more of them),
so it is bloody inconvenient to first select them, and then repeat them
a number of time in the above transformation (I need to outer join more
than two tables) ........

Best regards,

Kim Bisgaard wrote:

> Hi Tom,
>
> This sounds like the same "problem" which prevented PG from using the
> indices, and thus giving abyssmal performance in this other thread:
>
>> I have two BIG tables (virtually identical) with 3 NOT NULL columns
>> Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id,
>> TimeObs) and valid ANALYSE (set statistics=100). I want to join the
>> two tables with a FULL OUTER JOIN.
>>
>> When I specify the query as:
>>
>> SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
>> FROM temp_dry_at_2m a
>> FULL OUTER JOIN temp_grass b USING (station_id, timeobs)
>> WHERE station_id = 52981
>> AND timeobs = '2004-1-1 0:0:0'
>
>
> Then I would also vote for improving the inteligence of the optimizer!
> :-)
>
> Regards,
> Kim.
>
> Tom Lane wrote:
>
>> Phil Endecott <spam_from_postgresql_general(at)chezphil(dot)org> writes:
>>
>>
>>> I don't see anything in there about LEFT OUTER JOIN though. Any ideas?
>>>
>>
>>
>> Oh, I missed that part of your message. Hmm, I think the issue is
>> that in
>>
>>
>>
>>>> D join (M join G on (M.g=G.id)) on (D.id=M.b) where D.id=nnn
>>>>
>>>
>>
>> the planner deduces M.b=nnn by transitivity, but when the join is an
>> outer join it can't make the same deduction.
>>
>> [ thinks some more... ] If we distinguished conditions that hold below
>> the join from those that hold above it, we could deduce that M.b=nnn can
>> be enforced below the join even though it might not be true above it.
>> There's no such mechanism in existence now, though.
>>
>> A possible workaround is to generate your query like
>>
>> D left join (M join G on (M.g=G.id)) on (D.id=M.b AND M.b=nnn) where
>> D.id=nnn
>>
>> but I don't know how practical that is for you.
>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faq
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Zlatko Matić 2005-06-23 07:58:31 help about the function
Previous Message Kapil Malhotra 2005-06-23 06:41:29 Help me urgently