Quick Links

Re: A very long running query....

From:	"Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To:	"Ioannis Anagnostopoulos" <ioannis(at)anatec(dot)com>
Cc:	"Claudio Freire" <klaussfreire(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: A very long running query....
Date:	2012-07-21 09:16:16
Message-ID:	C4DAC901169B624F933534A26ED7DF310F96B02D@JENMAIL01.ad.intershop.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice pgsql-performance

Hello,
isn't the first test superfluous here ?

> where extract('day' from message_copies.msg_date_rec) = 17
> and date_trunc('day', message_copies.msg_date_rec) = '2012-07-17'

> Here is the index:
>
> CREATE INDEX idx_message_copies_wk2_date_src_pos_partial
> ON feed_all_y2012m07.message_copies_wk2
> USING btree
> (date_trunc('day'::text, msg_date_rec),
> src_id,
> (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text))
> TABLESPACE archive
> WHERE (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) IS NOT NULL
> OR NOT (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) = ''::text;

the georef test can be simplified using coalesce:

> and (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) <> ''
> and not (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) is null
=>
and coaesce (
(message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4),
'') <> ''

In order to avoid this test at query time you might add a boolean column message_copies.pos.has_georef,
and keep it up to date with a before insert or update trigger. This will allow to shorten your index definition and simplify the planner task a little bit.
Moreover it will fasten your query in cases when the index don't get used.

As Tom already mentioned it, it may make sense not to concatenate the georef within the index, but keep them separated, or even keep them in different indexes.
Which is the best depend on the other queries running against this table

HTH,

Marc Mamin

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org on behalf of Ioannis Anagnostopoulos
Sent: Sat 7/21/2012 1:56 AM
To: Tom Lane
Cc: Claudio Freire; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] A very long running query....

On 21/07/2012 00:10, Tom Lane wrote:
> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>> Looking at this:
>> " -> Index Scan using
>> idx_message_copies_wk2_date_src_pos_partial on message_copies_wk2
>> message_copies (cost=0.00..19057.93 rows=52 width=32) (actual
>> time=62.124..5486270.845 rows=387524 loops=1)"
>> " Index Cond: ((date_trunc('day'::text,
>> msg_date_rec) = '2012-07-17 00:00:00'::timestamp without time zone)
>> AND (src_id = 1))"
>> " Filter: ((date_part('day'::text,
>> msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
>> || (pos_georef2)::text) || (pos_georef3)::text) ||
>> (pos_georef4)::text) IS NULL)) AND (((((pos_georef1)::text ||
>> (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
>> <> ''::text))"
> I think the real problem is that the planner has no hope of doing
> anything very accurate with such an unwieldy filter condition. I'd look
> at ways of making the filter conditions simpler, perhaps by recasting
> the data representation. In particular, that's a horridly bad way of
> asking whether some columns are empty, which I gather is the intent.
> If you really want to do it just like that, creating an index on the
> concatenation expression would guide ANALYZE to collect some stats about
> it, but it would probably be a lot more efficient to put together an AND
> or OR of tests on the individual columns.
>
> regards, tom lane
So what you suggest is to forget all together the concatenation of the
georef1/2/3/4 and instead alter my query with something like:

georef1 is not null and not georeg1 = ''....etc for georef2 3 and 4

That would require to alter my index and have the four georef columns
separately in it and not as a concatenation and so on for the partial
index part. And a final thing, you seem to imply that the indexes are
used by the analyser to collect statistics even if they are not used. So
an index serves not only as a way to speed up targeted queries but also
to provide better statistics to the analyzer?

Kind Regards
Yiannis

In response to

Re: A very long running query.... at 2012-07-20 23:56:34 from Ioannis Anagnostopoulos

Responses

Re: A very long running query.... at 2012-07-21 09:22:09 from Ioannis Anagnostopoulos

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Ioannis Anagnostopoulos	2012-07-21 09:22:09	Re: A very long running query....
Previous Message	Craig Ringer	2012-07-21 08:02:21	Re: A very long running query....

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ioannis Anagnostopoulos	2012-07-21 09:22:09	Re: A very long running query....
Previous Message	Craig Ringer	2012-07-21 08:30:44	Re: query overhead