On Wed, May 2, 2012 at 5:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
> > Imagine we've two queries:
> > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
> > 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';
> > The first query require reading posting lists of trigrams "abc" and
> > The second query require reading posting lists of trigrams "abc", "bcd",
> > "cde", "def", "efg", "fgh", "ghi", "hij" and "ijk".
> > We could decide to use index scan for first query and sequential scan for
> > second query because number of posting list to read is high. But it is
> > unreasonable because actually second query is narrower than the first
> > We can use same index scan for it, recheck will remove all false
> > When number of trigrams is high we can just exclude some of them from
> > scan. It would be better than just decide to do sequential scan. But the
> > question is what trigrams to exclude? Ideally we would leave most rare
> > trigrams to make index scan cheaper.
> True. I guess I was thinking more of the case where you've got
> abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|.... There's probably some point
> at which it becomes silly to think about using the index.
Yes, such situations are also possible.
>> Well, I'm not an expert on encodings, but it seems like a logical
> >> extension of what we're doing right now, so I don't really see why
> >> not. I'm confused by the diff hunks in pg_mule2wchar_with_len,
> >> though. Presumably either the old code is right (in which case, don't
> >> change it) or the new code is right (in which case, there's a bug fix
> >> needed here that ought to be discussed and committed separately from
> >> the rest of the patch). Maybe I am missing something.
> > Unfortunately I didn't understand original logic
> of pg_mule2wchar_with_len.
> > I just did proposal about how it could be. I hope somebody more familiar
> > with this code would clarify this situation.
> Well, do you think the current code is buggy, or not?
Probably, but I'm not sure. The conversion seems lossy to me unless I'm
missing something about mule encoding.
With best regards,
In response to
pgsql-hackers by date
|Next:||From: Kevin Grittner||Date: 2012-05-02 14:16:48|
|Subject: Re: proposal: additional error fields|
|Previous:||From: Robert Haas||Date: 2012-05-02 13:48:33|
|Subject: Re: Patch: add conversion from pg_wchar to multibyte|