Re: Shrinking TSvectors

From: Howard News <howardnews(at)selestial(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Shrinking TSvectors
Date: 2016-04-05 14:48:49
Message-ID: 5703D051.2020404@selestial.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/04/2016 15:15, Artur Zakirov wrote:
> On 05.04.2016 14:37, Howard News wrote:
>> Hi,
>>
>> does anyone have any pointers for shrinking tsvectors
>>
>> I have looked at the contents of some of these fields and they contain
>> many details that are not needed. For example...
>>
>> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
>> '-9972':945 '/partners/application.html':222
>> '/partners/program/program-agreement.pdf':271
>> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
>> '1':753,771 '12':366 '14':66 (...)"
>>
>> I am not interested in keeping the numbers or urls in the indexes.
>>
>> Thanks,
>>
>> Howard.
>>
>>
>
> Hello,
>
> You need create a new text search configuration. Here is an example of
> commands:
>
> CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
> PARSER = default
> );
> ALTER TEXT SEARCH CONFIGURATION public.english_cfg
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH pg_catalog.english_stem;
>
> Instead of the "pg_catalog.english_stem" you can use your own dictionary.
>
> Lets compare new configuration with the embedded configuration
> "pg_catalog.english":
>
> postgres=# select to_tsvector('english_cfg', 'home -9972
> /partners/application.html /partners/program/program-agreement.pdf');
> to_tsvector
> -------------
> 'home':1
> (1 row)
>
> postgres=# select to_tsvector('english', 'home -9972
> /partners/application.html /partners/program/program-agreement.pdf');
> to_tsvector
> -----------------------------------------------------------------------------------------------
>
> '-9972':2 '/partners/application.html':3
> '/partners/program/program-agreement.pdf':4 'home':1
> (1 row)
>
>
> You can get some additional information about configurations using \dF+:
>
> postgres=# \dF+ english
> Text search configuration "pg_catalog.english"
> Parser: "pg_catalog.default"
> Token | Dictionaries
> -----------------+--------------
> asciihword | english_stem
> asciiword | english_stem
> email | simple
> file | simple
> float | simple
> host | simple
> hword | english_stem
> hword_asciipart | english_stem
> hword_numpart | simple
> hword_part | english_stem
> int | simple
> numhword | simple
> numword | simple
> sfloat | simple
> uint | simple
> url | simple
> url_path | simple
> version | simple
> word | english_stem
>
> postgres=# \dF+ english_cfg
> Text search configuration "public.english_cfg"
> Parser: "pg_catalog.default"
> Token | Dictionaries
> -----------------+--------------
> asciihword | english_stem
> asciiword | english_stem
> hword | english_stem
> hword_asciipart | english_stem
> hword_part | english_stem
> word | english_stem
>
Thanks Artur,

Thats amazing! Postgres never ceases to amaze me. And the same goes for
the contributors to this list.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message DrakoRod 2016-04-05 15:37:55 I can't see wal receiver process in one node
Previous Message Adrian Klaver 2016-04-05 14:45:53 Re: Shrinking TSvectors