Re: Custom index structure and strange count problem

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Carsten Kropf <ckropf2(at)fh-hof(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Custom index structure and strange count problem
Date: 2010-06-09 17:09:48
Message-ID: AANLkTiklX474n-w6zRG3vJ6c1Y8X2gdVNKSycp-G7iSU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf <ckropf2(at)fh-hof(dot)de> wrote:
> Hi *,
> during the last few months I've been building a new index structure as part of a research project.
> Everything seems to work properly, however I have some strange issues with the count sql command.
> I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed to work.
> There is an operator "&&" which is supposed to use my index structure (what also works properly).
> The function that maps to the operator "&&" is called hybrid_index_query, which I use to compare my results given from the index with the real results that are supposed to appear in the final result set.
> Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens:
>
> test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]')) order by id;
>  id
> ------
>  2137
>  2151
>  2168
>  2207
>  2208
>  2209
>  2210
>  2211
>  2266
>  2296
> (10 rows)
>
> This query takes a sequential scan and works properly (returning 10 rows).
>
> test=# select id from documents where to_document(words, points) && row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]') order by id;
>  id
> ------
>  2137
>  2151
>  2168
>  2207
>  2208
>  2209
>  2210
>  2211
>  2266
>  2296
> (10 rows)
>
> This query uses my index structure and returns the same result as in the sequential scan above.
> Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function in SQL, there are some odd results:
> test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]'));
>  count
> -------
>    10
> (1 row)
>
> Using the sequential scan, still, everything seems fine.
> However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the following result:
> test=# select count(*) from documents where to_document(words, points) && row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
>  count
> -------
>     7
> (1 row)
>
> This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries that count the id column, I receive the following:
> test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector, '[(-90,-180),(90, 180)]'));
>  count
> -------
>    10
> (1 row)
>
> test=# select count(id) from documents where to_document(words, points) && row('pleas radio news'::tsvector, '[(-90,-180),(90,180)]');
>  count
> -------
>    10
> (1 row)
>
> These two queries do again return the same results.
> Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results are wrong, somehow?
> Thanks in advance

I am guessing this is a bug in your code - have you used EXPLAIN to
verify that the second-to-last of the above queries is really hitting
your code? If so, I'd recommend attaching with gdb and setting a
breakpoint wherever you return the tuples, and then poke around...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2010-06-09 17:15:17 Re: hstore ==> and deprecate =>
Previous Message Tom Lane 2010-06-09 17:04:54 Re: hstore ==> and deprecate =>