Re: Problems with FTS

From: Rauan Maemirov <rauan(at)maemirov(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-11-21 05:53:47
Message-ID: CAFw_bqOdQqE0fUVntCX-aV+oZnzEgpmGqce=ODktJkNktFEEBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The problem has returned back, and here's the results, as you've said it's
faster now:

SET enable_seqscan=off;
EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
WHERE (v.active) AND (v.fts @@
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery
and v.id <> 500563 )
ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
1) DESC, v.views DESC
LIMIT 6

Limit (cost=219631.83..219631.85 rows=6 width=287) (actual
time=1850.567..1850.570 rows=6 loops=1)
-> Sort (cost=219631.83..220059.05 rows=170886 width=287) (actual
time=1850.565..1850.566 rows=6 loops=1)
Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views
Sort Method: top-N heapsort Memory: 26kB
-> Bitmap Heap Scan on video v (cost=41180.92..216568.73
rows=170886 width=287) (actual time=214.842..1778.830 rows=103087 loops=1)
Recheck Cond: (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A
) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) |
''серия'':A'::tsquery)
Filter: (active AND (id <> 500563))
-> Bitmap Index Scan on idx_video_fts (cost=0.00..41138.20
rows=218543 width=0) (actual time=170.206..170.206 rows=171945 loops=1)
Index Cond: (fts @@ '( ( ( ( ( ''dexter'':A |
''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
''сезон'':A ) | ''серия'':A'::tsquery)
Total runtime: 1850.632 ms

Should I use this instead?

2011/1/15 Robert Haas <robertmhaas(at)gmail(dot)com>

> On Tue, Jan 11, 2011 at 3:16 AM, Rauan Maemirov <rauan(at)maemirov(dot)com>
> wrote:
> > Hi, Kevin.
> > Sorry for long delay.
> > EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
> > WHERE (v.active) AND (v.fts @@
> > 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery
> and
> > v.id <> 500563 )
> > ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
> >
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
> > 1) DESC, v.views DESC
> > LIMIT 6
> > "Limit (cost=103975.50..103975.52 rows=6 width=280) (actual
> > time=2893.193..2893.199 rows=6 loops=1)"
> > " -> Sort (cost=103975.50..104206.07 rows=92228 width=280) (actual
> > time=2893.189..2893.193 rows=6 loops=1)"
> > " Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts,
> '( (
> > ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
> > ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)),
> views"
> > " Sort Method: top-N heapsort Memory: 25kB"
> > " -> Seq Scan on video v (cost=0.00..102322.34 rows=92228
> > width=280) (actual time=0.100..2846.639 rows=54509 loops=1)"
> > " Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A |
> > ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
> > ''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))"
> > "Total runtime: 2893.264 ms"
> > Table scheme:
> > CREATE TABLE video
> > (
> > id bigserial NOT NULL,
> > hash character varying(12),
> > account_id bigint NOT NULL,
> > category_id smallint NOT NULL,
> > converted boolean NOT NULL DEFAULT false,
> > active boolean NOT NULL DEFAULT true,
> > title character varying(255),
> > description text,
> > tags character varying(1000),
> > authorized boolean NOT NULL DEFAULT false,
> > adult boolean NOT NULL DEFAULT false,
> > views bigint DEFAULT 0,
> > rating real NOT NULL DEFAULT 0,
> > screen smallint DEFAULT 2,
> > duration smallint,
> > "type" smallint DEFAULT 0,
> > mp4 smallint NOT NULL DEFAULT 0,
> > size bigint,
> > size_high bigint DEFAULT 0,
> > source character varying(255),
> > storage_id smallint NOT NULL DEFAULT 1,
> > rule_watching smallint,
> > rule_commenting smallint,
> > count_comments integer NOT NULL DEFAULT 0,
> > count_likes integer NOT NULL DEFAULT 0,
> > count_faves integer NOT NULL DEFAULT 0,
> > fts tsvector,
> > modified timestamp without time zone NOT NULL DEFAULT now(),
> > created timestamp without time zone DEFAULT now(),
> > CONSTRAINT video_pkey PRIMARY KEY (id),
> > CONSTRAINT video_hash_key UNIQUE (hash)
> > )
> > WITH (
> > OIDS=FALSE
> > );
> > Indexes:
> > CREATE INDEX idx_video_account_id ON video USING btree (account_id);
> > CREATE INDEX idx_video_created ON video USING btree (created);
> > CREATE INDEX idx_video_fts ON video USING gin (fts);
> > CREATE INDEX idx_video_hash ON video USING hash (hash);
> > (here I tried both gist and gin indexes)
> > I have 32Gb ram and 2 core quad E5520, 2.27GHz (8Mb cache).
> > Pgsql conf:
> > max_connections = 200
> > shared_buffers = 7680MB
> > work_mem = 128MB
> > maintenance_work_mem = 1GB
> > effective_cache_size = 22GB
> > default_statistics_target = 100
> > Anything else?
>
> For returning that many rows, an index scan might actually be slower.
> Maybe it's worth testing. Try:
>
> SET enable_seqscan=off;
> EXPLAIN ANALYZE ...
>
> and see what you get. If it's slower, well, then be happy it didn't
> use the index (maybe the question is... what index should you have
> instead?). If it's faster, post the results...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christiaan Willemsen 2011-11-21 21:03:46 SSD endurance calculations
Previous Message Jeremy Harris 2011-11-20 16:11:53 Re: external sort performance