Re: GIN improvements part2: fast scan

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Alexander Korotkov" <aekorotkov(at)gmail(dot)com>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-03 19:02:31
Message-ID: cdbb1c8d4b3996f3ddd48dd01162d51b.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 Únor 2014, 19:18, Alexander Korotkov wrote:
> On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>
>> > > Sometimes test cases are not what we expect. For example:
>> >> >
>> >> > =# explain SELECT id FROM messages WHERE body_tsvector @@
>> >> > to_tsquery('english','(5alpha1-initdb''d)');
>> >> > QUERY PLAN
>> >> >
>> >> >
>> >>
>> ────────────────────────────────────────────────────────────────────────────────
>> >> > Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4)
>> >> > Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> ''5alpha1'' &
>> >> > ''initdb'' & ''d'''::tsquery)
>> >> > -> Bitmap Index Scan on messages_body_tsvector_idx
>> >> (cost=0.00..84.00
>> >> > rows=1 width=0)
>> >> > Index Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> >> ''5alpha1''
>> >> &
>> >> > ''initdb'' & ''d'''::tsquery)
>> >> > Planning time: 0.257 ms
>> >> > (5 rows)
>> >> >
>> >> > 5alpha1-initdb'd is 3 gin entries with different frequencies.
>> >>
>> >> Why do you find that strange? The way the query is formed or the way
>> >> it's
>> >> evaluated?
>> >>
>> >> The query generator certainly is not perfect, so it may produce some
>> >> strange queries.
>> >>
>> >
>> > I just mean that in this case 3 words doesn't mean 3 gin entries.
>>
>> Isn't that expected? I mean, that's what to_tsquery may do, right?
>>
>
> Everything is absolutely correct. :-) It just may be not what do you
> expect
> if you aren't getting into details.

Well, that's not how I designed the benchmark. I haven't based the
benchmark on GIN entries, but on 'natural' words, to simulate real
queries. I understand using GIN terms might get "more consistent" results
(e.g. 3 GIN terms with given frequency) than the current approach.

However this was partially a goal, to cover wider range of cases. Also,
that's why the benchmark works with relative speedup - comparing the query
duration with and without the patch.

Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-02-03 19:34:34 Re: [PATCH] pg_sleep(interval)
Previous Message Alexander Korotkov 2014-02-03 18:18:58 Re: GIN improvements part2: fast scan