Skip site navigation (1) Skip section navigation (2)

Re: A very long running query....

From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Ioannis Anagnostopoulos" <ioannis(at)anatec(dot)com>
Cc: "Claudio Freire" <klaussfreire(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: A very long running query....
Date: 2012-07-21 09:16:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-novicepgsql-performance

isn't the first test superfluous here ?

>	where extract('day' from message_copies.msg_date_rec) = 17
>	and date_trunc('day',    message_copies.msg_date_rec) = '2012-07-17'

> Here is the index:
> CREATE INDEX idx_message_copies_wk2_date_src_pos_partial
>   ON feed_all_y2012m07.message_copies_wk2
>   USING btree
>   (date_trunc('day'::text, msg_date_rec),
>   src_id,
>   (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text))
> TABLESPACE archive
>   WHERE (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) IS NOT NULL 
>   OR NOT (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) = ''::text;

the georef test can be simplified using coalesce:

>  and     (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) <> ''
>  and not (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) is null
  and coaesce ( 
    (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4), 
     '') <> ''
In order to avoid this test at query time you might add a boolean column   message_copies.pos.has_georef,
and keep it up to date  with a before insert or update trigger. This will allow to shorten your index definition and simplify the planner task a little bit.
Moreover it will fasten your query in cases when the index don't get used.

As Tom already mentioned it, it may make sense not to concatenate the georef within the index, but keep them separated, or even keep them in different indexes.
Which is the best depend on the other queries running against this table

Marc Mamin

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org on behalf of Ioannis Anagnostopoulos
Sent: Sat 7/21/2012 1:56 AM
To: Tom Lane
Cc: Claudio Freire; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] A very long running query....
On 21/07/2012 00:10, Tom Lane wrote:
> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>> Looking at this:
>> "                    ->  Index Scan using
>> idx_message_copies_wk2_date_src_pos_partial on message_copies_wk2
>> message_copies  (cost=0.00..19057.93 rows=52 width=32) (actual
>> time=62.124..5486270.845 rows=387524 loops=1)"
>> "                          Index Cond: ((date_trunc('day'::text,
>> msg_date_rec) = '2012-07-17 00:00:00'::timestamp without time zone)
>> AND (src_id = 1))"
>> "                          Filter: ((date_part('day'::text,
>> msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
>> || (pos_georef2)::text) || (pos_georef3)::text) ||
>> (pos_georef4)::text) IS NULL)) AND (((((pos_georef1)::text ||
>> (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
>> <> ''::text))"
> I think the real problem is that the planner has no hope of doing
> anything very accurate with such an unwieldy filter condition.  I'd look
> at ways of making the filter conditions simpler, perhaps by recasting
> the data representation.  In particular, that's a horridly bad way of
> asking whether some columns are empty, which I gather is the intent.
> If you really want to do it just like that, creating an index on the
> concatenation expression would guide ANALYZE to collect some stats about
> it, but it would probably be a lot more efficient to put together an AND
> or OR of tests on the individual columns.
> 			regards, tom lane
So what you suggest is to forget all together the concatenation of the 
georef1/2/3/4 and instead alter my query with something like:

georef1 is not null and not georeg1 = ''....etc for georef2 3 and 4

That would require to alter my index and have the four georef columns 
separately in it and not as a concatenation and so on for the partial 
index part. And a final thing, you seem to imply that the indexes are 
used by the analyser to collect statistics even if they are not used. So 
an index serves not only as a way to speed up targeted queries but also 
to provide better statistics to the analyzer?

Kind Regards

In response to


pgsql-novice by date

Next:From: Ioannis AnagnostopoulosDate: 2012-07-21 09:22:09
Subject: Re: A very long running query....
Previous:From: Craig RingerDate: 2012-07-21 08:02:21
Subject: Re: A very long running query....

pgsql-performance by date

Next:From: Ioannis AnagnostopoulosDate: 2012-07-21 09:22:09
Subject: Re: A very long running query....
Previous:From: Craig RingerDate: 2012-07-21 08:30:44
Subject: Re: query overhead

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group