From: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de> |
Cc: | Dmitrii Golub <dmitrii(dot)golub(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: unexpected result from to_tsvector |
Date: | 2016-03-14 14:45:16 |
Message-ID: | 56E6CE7C.30409@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14.03.2016 16:22, Shulgin, Oleksandr wrote:
>
> Hm... now that doesn't look all that consistent to me (after applying
> the patch):
>
> =# select ts_debug('simple', 'aaa(at)123-yyy(dot)zzz');
> ts_debug
> ---------------------------------------------------------------------------
> (email,"Email address",aaa(at)123-yyy(dot)zzz,{simple},simple,{aaa(at)123-yyy(dot)zzz})
> (1 row)
>
> But:
>
> =# select ts_debug('simple', 'aaa(at)123_yyy(dot)zzz');
> ts_debug
> ---------------------------------------------------------
> (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
> (blank,"Space symbols",@,{},,)
> (uint,"Unsigned integer",123,{simple},simple,{123})
> (blank,"Space symbols",_,{},,)
> (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (5 rows)
>
> One can also see that if we only keep the domain name, the result is
> similar:
>
> =# select ts_debug('simple', '123-yyy.zzz');
> ts_debug
> -------------------------------------------------------
> (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '123_yyy.zzz');
> ts_debug
> -----------------------------------------------------
> (uint,"Unsigned integer",123,{simple},simple,{123})
> (blank,"Space symbols",_,{},,)
> (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (3 rows)
>
> But, this only has to do with 123 being recognized as a number, not with
> the underscore:
>
> =# select ts_debug('simple', 'abc_yyy.zzz');
> ts_debug
> -------------------------------------------------------
> (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '1abc_yyy.zzz');
> ts_debug
> -------------------------------------------------------
> (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
> (1 row)
>
> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
> (subdomain can't start with a digit), but since we already allow it,
> should we not allow 123_yyy.zzz to be recognized as a Host? Then why
> not recognize aaa(at)123_yyy(dot)zzz as an email address?
>
> Another option is to prohibit underscore in recognized host names, but
> this has more breakage potential IMO.
>
> --
> Alex
>
It seems reasonable to me. I like more first option. But I am not
confident that we should allow 123_yyy.zzz to be recognized as a Host.
By the way, in this question http://webmasters.stackexchange.com/a/775
you can see examples of domain names with numbers (but not subdomains).
If there are not objections from others, I will send a new patch today
later or tomorrow with 123_yyy.zzz recognizing.
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-03-14 14:46:21 | Re: remove wal_level archive |
Previous Message | Tom Lane | 2016-03-14 14:43:27 | Re: Sanity checking for ./configure options? |