Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-03 16:08:04
Message-ID: CAPpHfdv0vT8A=krseEdya7btrbPCjyf0e=kMT39xXZuY5E-ibQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
> >
> > I found my patch "0005-Ternary-consistent-implementation.patch" to be
> > completely wrong. It introduces ternary consistent function to opclass,
> > but
> > don't uses it, because I forgot to include ginlogic.c change into patch.
> > So, it shouldn't make any impact on performance. However, testing results
> > with that patch significantly differs. That makes me very uneasy. Can we
> > now reproduce exact same?
>
> Do I understand it correctly that the 0005 patch should give exactly the
> same performance as the 9.4-heikki branch (as it was applied on it, and
> effectively did no change). This wasn't exactly what I measured, although
> the differences were not that significant.
>

Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
http://www.fuzzy.cz/tmp/gin/#
In some queries it differs in times. I wonder why.

I can rerun the tests, if that's what you're asking for. I'll improve the
> test a bit - e.g. I plan to average multiple runs, to filter out random
> noise (which might be significant for such short queries).
>
> > Right version of these two patches in one against current head is
> > attached.
> > I've rerun tests with it, results are
> > /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
> > postprocessing including graph drawing?
>
> Yes, I'll do that. However I'll have to rerun the other tests too, because
> the
> previous runs were done on a different machine.
>
> I'm a bit confused right now. The previous patches (0005 + 0007) were
> supposed
> to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
> were
> not commited yet, so why is this version against HEAD?
>
> To summarize, I know of these patch sets:
>
> 9.4-heikki (old version)
> 0001-Optimize-GIN-multi-key-queries.patch
> 0002-Further-optimize-the-multi-key-GIN-searches.patch
> 0003-Further-optimize-GIN-multi-key-searches.patch
> 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch
>
> 9.4-alex-1 (based on 9.4-heikki)
> 0005-Ternary-consistent-implementation.patch
>
> 9.4-alex-1 (based on 9.4-alex-1)
> 0006-Sort-entries.patch
>

From these patches I only need to compare 9.4-heikki (old version) and
9.4-alex-1 to release my doubts.

9.4-heikki (new version)
> gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch
>
> 9.4-alex-2 (new version)
> gin-fast-scan.10.patch.gz
>
> Or did I get that wrong?
>

Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
numbering.

> Sometimes test cases are not what we expect. For example:
> >
> > =# explain SELECT id FROM messages WHERE body_tsvector @@
> > to_tsquery('english','(5alpha1-initdb''d)');
> > QUERY PLAN
> >
> >
> ────────────────────────────────────────────────────────────────────────────────
> > Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4)
> > Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> > ''initdb'' & ''d'''::tsquery)
> > -> Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00
> > rows=1 width=0)
> > Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1''
> &
> > ''initdb'' & ''d'''::tsquery)
> > Planning time: 0.257 ms
> > (5 rows)
> >
> > 5alpha1-initdb'd is 3 gin entries with different frequencies.
>
> Why do you find that strange? The way the query is formed or the way it's
> evaluated?
>
> The query generator certainly is not perfect, so it may produce some
> strange queries.
>

I just mean that in this case 3 words doesn't mean 3 gin entries.

> Also, these patches are not intended to change relevance ordering speed.
> > When number of results are high, most of time is relevance calculating
> and
> > sorting. I propose to remove ORDER BY clause from test cases to see scan
> > speed more clear.
>
> Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
> one without it.
>

Good.

> I've dump of postgresql.org search queries from Magnus. We can add them
> to
> > our test case.
>
> You mean search queries from the search for mailing list archives? Sure,
> we add that.
>

Yes. I'll transform it into tsquery and send you privately.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-03 16:12:00 Re: bugfix patch for json_array_elements
Previous Message Alexander Korotkov 2014-02-03 16:02:07 Re: KNN-GiST with recheck