Quick Links

Re: spgist text_ops and LIKE

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: spgist text_ops and LIKE
Date:	2012-02-02 13:45:44
Message-ID:	CA+Tgmoaww0Dgb=v5T_b1zviy4AxZ+BoU5vMZDVBGGWMA8OkLcQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Is spgist intended to support prefix searches with LIKE?
>
> Too lazy to look at the code right now, but I think indxpath.c contains
> hardwired assumptions that LIKE prefix optimizations are only possible
> with btree indexes. Possibly it would be worth relaxing that. (The
> whole "special index operator" mechanism is undesirably special-purpose,
> but I currently have no ideas about how to make it more flexible.)

Is it as simple as the attached?

Given the set of operators that are available for spgist, one would
assume that it's intended to handle both pattern-matching and regular
comparison operators. On the flip side, spg_text_inner_consistent()
seems to fall back to a full index-scan for the regular comparison
operators, which makes me wonder why anyone would bother having the
operator at all. Indeed, in my testing, it's slightly slower than a
sequential scan:

select sum(1) from person where last_name > 'WAR';
- btree: 113-114ms
- btree with text_pattern_ops: can't do it
- spgist: 920-935ms
- sequential scan: 895-910ms

With the attached patch, it seems to work as well as text_pattern_ops
for pattern matching:

select sum(1) from person where last_name LIKE 'WAR%';
- btree: can't do it
- btree with text_pattern_ops: 6-7ms
- spgist: 7-9ms
- sequential scan: 170-180ms

It's fine for equality, too:

select sum(1) from person where last_name = 'WARNER';
- btree: 1.7-1.9ms
- btree with text_pattern_ops: 1.7-1.9ms
- spgist: 1.7-1.9ms
- sequential scan: 160-165ms

It seems that there's at least the potential for spgist indexes to be
extremely efficient, given adequate knowledge about the collation
under which comparisons are being done. If for example we're trying
to find strings where last_name > 'WAR', we certainly need to chase
down the subtrees where the first character is 'W', 'X', 'Y', or 'Z'.
If the collation is case-insensitive, we might also need to chase 'w',
'x', 'y', and 'z'. And there might be other characters that need to
be chased as well, depending on the collation: some collations might
sort numbers before letters, while others might do the reverse. But
it's likely that there are few if any collations where we need to
chase down the subtree for 'Q' on this query, because anything
beginning with 'Q' is likely to sort before anything beginning with
'WAR' pretty much anywhere. I'm not sure there's a good way to get
that information without a lot of grotty hard-coding, but it seems
like the potential is there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
spgist-pattern-match.patch	application/octet-stream	1.8 KB

In response to

Re: spgist text_ops and LIKE at 2012-02-02 03:57:48 from Tom Lane

Responses

Re: spgist text_ops and LIKE at 2012-02-02 17:18:26 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2012-02-02 13:54:32	Re: JSON for PG 9.2
Previous Message	Pavel Stehule	2012-02-02 11:44:01	Re: Patch pg_is_in_backup()