Re: Help with Query Tuning

From: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help with Query Tuning
Date: 2011-03-17 06:25:21
Message-ID: 4D81A951.7030609@orkash.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks, I understand it know :-

But My one doubt which isn't clear :

*Original Query :-*

select count(*) from page_content where (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content
like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like
'%police%' OR content like '%defence%' OR content like '%cops%' OR
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%')
AND (content like '%kill%' OR content like '%injure%');

*Output :-*
count
-------
57061
(1 row)

Time: 19726.555 ms

I need to tune it , use full-text searching as :

*Modified Query :-

*SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' ||
'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' ||
'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' ||
'dsf' || 'ssb');

*Output :-*
count
-------
0
(1 row)

Time: 194685.125 ms
*
*I try, SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' ||
'%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' ||
'%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%'
|| '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');

count
-------
0
(1 row)

Time: 194722.468 ms

I know I have to create index but index is the next step, first you have
to get the correct result .

CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english',
content));

Please guide me where I am going wrong.

Thanks & best Regards,

Adarsh Sharma
Kenneth Marshall wrote:
> On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
>
>> Dear all,
>>
>> I am facing a problem while creating the index to make the below query run
>> faster. My table size is near about 1065 MB and 428467 rows.
>>
>> explain analyze select count(*) from page_content where publishing_date
>> like '%2010%' and content_language='en' and content is not null and
>> isprocessable = 1 and (content like '%Militant%'
>> OR content like '%jihad%' OR content like '%Mujahid%' OR
>> content like '%fedayeen%' OR content like '%insurgent%' OR content like
>> '%terrorist%' OR
>> content like '%cadre%' OR content like '%civilians%' OR content like
>> '%police%' OR content like '%defence%' OR content like '%cops%' OR content
>> like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content
>> like '%kill%' or content like '%injure%');
>>
>> *Output:
>>
>> * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual
>> time=18564.631..18564.631 rows=1 loops=1)
>> -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0)
>> (actual time=0.146..18529.371 rows=59918 loops=1)
>> Filter: ((content IS NOT NULL) AND (publishing_date ~~
>> '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable =
>> 1) AND (((content)
>> ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND
>> (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~
>> '%jihad%'::text) OR (
>> (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~
>> '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR
>> ((content)::text ~~ '%terrori
>> st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~
>> '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR
>> ((content)::text
>> ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR
>> ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text)
>> OR ((content)::text
>> ~~ '%ssb%'::text)))
>> Total runtime: 18564.673 ms
>>
>>
>> *Index on that Table :
>>
>> *CREATE INDEX idx_page_id
>> ON page_content
>> USING btree
>> (crawled_page_id);
>>
>> *Index I create :*
>> CREATE INDEX idx_page_id_content
>> ON page_content
>> USING btree
>> (crawled_page_id,content_language,publishing_date,isprocessable);
>>
>> *Index that fail to create:
>>
>> *CREATE INDEX idx_page_id_content1
>> ON page_content
>> USING btree
>> (crawled_page_id,content);
>>
>> Error :-ERROR: index row requires 13240 bytes, maximum size is 8191
>> ********** Error **********
>>
>> ERROR: index row requires 13240 bytes, maximum size is 8191
>> SQL state: 54000
>>
>> How to resolve this error
>> Please give any suggestion to tune the query.
>>
>> Thanks & best Regards,
>>
>> Adarsh Sharma
>>
>>
>
> You should probably be looking at using full-text indexing:
>
> http://www.postgresql.org/docs/9.0/static/textsearch.html
>
> or limit the size of content for the index.
>
> Cheers,
> Ken
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message tv 2011-03-17 09:34:54 Re: Help with Query Tuning
Previous Message Adarsh Sharma 2011-03-17 04:48:36 Re: Help with Query Tuning