Re: GIN improvements part2: fast scan

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Alexander Korotkov" <aekorotkov(at)gmail(dot)com>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-03 16:19:34
Message-ID: 368f0edae6c56767e78da2e45eb43a93.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 Únor 2014, 17:08, Alexander Korotkov wrote:
> On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>
>> On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
>> >
>> > I found my patch "0005-Ternary-consistent-implementation.patch" to be
>> > completely wrong. It introduces ternary consistent function to
>> opclass,
>> > but
>> > don't uses it, because I forgot to include ginlogic.c change into
>> patch.
>> > So, it shouldn't make any impact on performance. However, testing
>> results
>> > with that patch significantly differs. That makes me very uneasy. Can
>> we
>> > now reproduce exact same?
>>
>> Do I understand it correctly that the 0005 patch should give exactly the
>> same performance as the 9.4-heikki branch (as it was applied on it, and
>> effectively did no change). This wasn't exactly what I measured,
>> although
>> the differences were not that significant.
>>
>
> Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
> http://www.fuzzy.cz/tmp/gin/#

Yes.

> In some queries it differs in times. I wonder why.

Not sure.

> I can rerun the tests, if that's what you're asking for. I'll improve the
>> test a bit - e.g. I plan to average multiple runs, to filter out random
>> noise (which might be significant for such short queries).
>>
>> > Right version of these two patches in one against current head is
>> > attached.
>> > I've rerun tests with it, results are
>> > /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
>> > postprocessing including graph drawing?
>>
>> Yes, I'll do that. However I'll have to rerun the other tests too,
>> because
>> the
>> previous runs were done on a different machine.
>>
>> I'm a bit confused right now. The previous patches (0005 + 0007) were
>> supposed
>> to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK
>> those
>> were
>> not commited yet, so why is this version against HEAD?
>>
>> To summarize, I know of these patch sets:
>>
>> 9.4-heikki (old version)
>> 0001-Optimize-GIN-multi-key-queries.patch
>> 0002-Further-optimize-the-multi-key-GIN-searches.patch
>> 0003-Further-optimize-GIN-multi-key-searches.patch
>> 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch
>>
>> 9.4-alex-1 (based on 9.4-heikki)
>> 0005-Ternary-consistent-implementation.patch
>>
>> 9.4-alex-1 (based on 9.4-alex-1)
>> 0006-Sort-entries.patch
>>
>
> From these patches I only need to compare 9.4-heikki (old version) and
> 9.4-alex-1 to release my doubts.

OK, understood.

>
> 9.4-heikki (new version)
>> gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch
>>
>> 9.4-alex-2 (new version)
>> gin-fast-scan.10.patch.gz
>>
>> Or did I get that wrong?
>>
>
> Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
> numbering.

You're right. It should have been like this:

9.4-alex-1 (based on 9.4-heikki)
0005-Ternary-consistent-implementation.patch

9.4-alex-2 (based on 9.4-alex-1)
0006-Sort-entries.patch

9.4-alex-3 (new version, not yet tested)
gin-fast-scan.10.patch.gz

>
> > Sometimes test cases are not what we expect. For example:
>> >
>> > =# explain SELECT id FROM messages WHERE body_tsvector @@
>> > to_tsquery('english','(5alpha1-initdb''d)');
>> > QUERY PLAN
>> >
>> >
>> ────────────────────────────────────────────────────────────────────────────────
>> > Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4)
>> > Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
>> > ''initdb'' & ''d'''::tsquery)
>> > -> Bitmap Index Scan on messages_body_tsvector_idx
>> (cost=0.00..84.00
>> > rows=1 width=0)
>> > Index Cond: (body_tsvector @@ '''5alpha1-initdb'' &
>> ''5alpha1''
>> &
>> > ''initdb'' & ''d'''::tsquery)
>> > Planning time: 0.257 ms
>> > (5 rows)
>> >
>> > 5alpha1-initdb'd is 3 gin entries with different frequencies.
>>
>> Why do you find that strange? The way the query is formed or the way
>> it's
>> evaluated?
>>
>> The query generator certainly is not perfect, so it may produce some
>> strange queries.
>>
>
> I just mean that in this case 3 words doesn't mean 3 gin entries.

Isn't that expected? I mean, that's what to_tsquery may do, right?

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-02-03 16:21:21 Re: bgworker crashed or not?
Previous Message Tom Lane 2014-02-03 16:12:00 Re: bugfix patch for json_array_elements