Re: Comparison of Oracle and PostgreSQL full text search

From: Howard Rogers <hjr(at)diznix(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Comparison of Oracle and PostgreSQL full text search
Date: 2010-07-29 23:38:32
Message-ID: AANLkTikUeQ077FHF0C7pKQ_BbsUGyoy9wkg3qY=e7fvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jul 29, 2010 at 10:33 PM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On 28 July 2010 02:58, Howard Rogers <hjr(at)diznix(dot)com> wrote:
>> For what it's worth, I wrote up the performance comparison here:
>> http://diznix.com/dizwell/archives/153
>>
>
> Thanks, very interesting results. I wonder, are the results being
> sorted by the database? The performance degradation for large numbers
> of results might be explained by it switching over from an internal to
> an external sort, in which case tweaking work_mem might make a
> difference.
>
> Of course this is pure speculation without the EXPLAIN ANALYSE output.
>
> Regards,
> Dean

Yes, the results were being sorted. I did various tests, changing
work_mem, shared_buffers and much else, one by one, until I arrived at
the combination of settings that gave me the best 'total search time'
results. Personally, I couldn't see any difference in the explain
plans, but I was in a bit of a hurry and I may have missed it.

For the search term 'woman', which matches 1,590,275 documents, here's
the explain plan:

"Sort (cost=185372.88..185372.93 rows=20 width=312) (actual
time=10537.152..10537.154 rows=20 loops=1)"
" Sort Key: a.rf, a.sort_id"
" Sort Method: quicksort Memory: 48kB"
" -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual
time=4309.020..10537.116 rows=20 loops=1)"
" -> Append (cost=109119.55..185372.45 rows=20 width=312)
(actual time=4309.018..10537.108 rows=20 loops=1)"
" -> Subquery Scan a (cost=109119.55..109119.68 rows=10
width=312) (actual time=4309.018..4309.026 rows=10 loops=1)"
" -> Limit (cost=109119.55..109119.58 rows=10
width=641) (actual time=4309.016..4309.019 rows=10 loops=1)"
" -> Sort (cost=109119.55..109121.94
rows=957 width=641) (actual time=4309.014..4309.015 rows=10 loops=1)"
" Sort Key: search_rm.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=272.851..4021.458 rows=583275 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual
time=165.711..165.711 rows=586235 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" -> Subquery Scan b (cost=76252.65..76252.77 rows=10
width=312) (actual time=6228.073..6228.080 rows=10 loops=1)"
" -> Limit (cost=76252.65..76252.67 rows=10
width=727) (actual time=6228.072..6228.075 rows=10 loops=1)"
" -> Sort (cost=76252.65..76254.29 rows=655
width=727) (actual time=6228.071..6228.072 rows=10 loops=1)"
" Sort Key: search_rf.sort_id"
" Sort Method: top-N heapsort Memory: 38kB"
" -> Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual
time=363.684..5748.279 rows=1007000 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual
time=242.859..242.859 rows=1030282 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''woman'''::tsquery)"
"Total runtime: 10538.832 ms"

And here's the plan for the search term "clover", which matches only
2,808 records in total:

" -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual
time=16.807..23.990 rows=20 loops=1)"
" -> Append (cost=109119.55..185372.45 rows=20 width=312)
(actual time=16.806..23.985 rows=20 loops=1)"
" -> Subquery Scan a (cost=109119.55..109119.68 rows=10
width=312) (actual time=16.806..16.812 rows=10 loops=1)"
" -> Limit (cost=109119.55..109119.58 rows=10
width=641) (actual time=16.805..16.807 rows=10 loops=1)"
" -> Sort (cost=109119.55..109121.94
rows=957 width=641) (actual time=16.804..16.805 rows=10 loops=1)"
" Sort Key: search_rm.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rm
(cost=6651.07..109098.87 rows=957 width=641) (actual
time=1.054..15.577 rows=1807 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual
time=0.615..0.615 rows=1807 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" -> Subquery Scan b (cost=76252.65..76252.77 rows=10
width=312) (actual time=7.161..7.166 rows=10 loops=1)"
" -> Limit (cost=76252.65..76252.67 rows=10
width=727) (actual time=7.161..7.163 rows=10 loops=1)"
" -> Sort (cost=76252.65..76254.29 rows=655
width=727) (actual time=7.160..7.161 rows=10 loops=1)"
" Sort Key: search_rf.sort_id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Bitmap Heap Scan on search_rf
(cost=5175.18..76238.49 rows=655 width=727) (actual time=0.433..6.642
rows=1001 loops=1)"
" Recheck Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
" Filter:
((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND
(bitand(filetype, 1) > 0))"
" -> Bitmap Index Scan on
rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual
time=0.250..0.250 rows=1045 loops=1)"
" Index Cond:
(to_tsvector('english'::regconfig, (textsearch)::text) @@
'''clover'''::tsquery)"
"Total runtime: 24.143 ms"

I can't see any change to the sorting behaviour there. Work_mem was
set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB,
effective_cache_size to 18442MB.

Sadly, I won't be able to provide much further analysis or
information, because the box concerned is being wiped. The MD decided
that, as a matter of corporate governance, he couldn't punt the
company on PostgreSQL, so my experimenting days are over. Back to
Oracle: slower, but with a support contract he can sue on, I guess!

Regards
HJR

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2010-07-29 23:42:57 Re: Comparison of Oracle and PostgreSQL full text search
Previous Message John R Pierce 2010-07-29 22:17:40 Re: How to improve: performance of query on postgresql 8.3 takes days