Quick Links

Re: big data - slow select (speech search)

From:	Michal Fapso <michal(dot)fapso(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: big data - slow select (speech search)
Date:	2010-07-07 13:31:29
Message-ID:	AANLkTimq2ap3WEQq1U8lMR0Sx8vfI8cXY1D69ouYrqno@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi Robert,

thank you for your help. I tried to cluster the table on
hyps_wordid_index and the query execution time dropped from 4.43 to
0.19 seconds which is not that far from Lucene's performance of 0.10
second.

Thanks a lot!
Miso Fapso

On 6 July 2010 02:25, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal(dot)fapso(at)gmail(dot)com> wrote:
>> It took about 4.5 seconds. If I rerun it, it takes
>> less than 2 miliseconds, but it is because of the cache. I need to
>> optimize the first-run.
>>
>> laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM
>>
>> EXPLAIN ANALYZE SELECT h1.docid
>> FROM hyps AS h1
>> WHERE h1.wordid=65658;
>>
>> Bitmap Heap Scan on hyps h1 (cost=10.97..677.09 rows=171 width=4)
>> (actual time=62.106..4416.864 rows=343 loops=1)
>> Recheck Cond: (wordid = 65658)
>> -> Bitmap Index Scan on hyps_wordid_index (cost=0.00..10.92
>> rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
>> Index Cond: (wordid = 65658)
>> Total runtime: 4432.015 ms
>>
>> If I run the same query in Lucene search engine, it takes 0.105
>> seconds on the same data which is quite a huge difference.
>
> So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
> ms/row. I'm not an expert on seek times, but that might not really be
> that unreasonable, considering that those rows may be scattered all
> over the index and thus it may be basically random I/O. Have you
> tried clustering hyps on hyps_wordid_index? If you had a more
> sophisticated disk subsystem you could try increasing
> effective_io_concurrency but that's not going to help with only one
> spindle.
>
> If you run the same query in Lucene and it takes only 0.105 s, then
> Lucene is obviously doing a lot less I/O. I doubt that any amount of
> tuning of your existing schema is going to produce that kind of result
> on PostgreSQL. Using the full-text search stuff, or a gin index of
> some kind, might get you closer, but it's hard to beat a
> special-purpose engine that implements exactly the right algorithm for
> your use case.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

In response to

Re: big data - slow select (speech search) at 2010-07-06 00:25:49 from Robert Haas

Responses

Re: big data - slow select (speech search) at 2010-07-07 20:49:39 from Robert Haas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	damien hostin	2010-07-07 14:39:38	Re: Slow query with planner row strange estimation
Previous Message	Merlin Moncure	2010-07-07 13:14:25	Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.