| From: | "Mark Johnson" <mark(at)remingtondatabasesolutions(dot)com> |
|---|---|
| To: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Tim" <elatllat(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org, "Greg Williamson" <gwilliamson39(at)yahoo(dot)com> |
| Subject: | Re: tsvector limitations |
| Date: | 2011-06-15 17:28:41 |
| Message-ID: | W565151360813081308158921@webmail12 |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-admin |
When this discussion first started, I immediately thought about people who full text index their server's log files. As a test I copied /var/log/messages to $PGDATA and then used the same pg_read_file() function you mentioned earlier to pull the data into a column of type text. The original file was 4.3 MB, and the db column had length 4334920 and the function pg_column_size reported a size of 1058747. I then added a column named tsv of type tsvector, and populated it using to_tsvector(). The function pg_column_size reported 201557. So in this test a 4.2 MB text file produced a tsvector of size 200 KB. If this scales linearly, then the max size of an input document would be 21.8 MB before you hit the tsvector limit of 1 MB. If you run a "find" command on your server for files larger than 20 MB, the percent is quite small maybe 1% of files. In the specific case of indexing postgresql's log files, you could use parameter log_rotation_size to ensure all files are smaller than N and avoid the tsvector limits.
-Mark
-----Original Message-----
From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
Sent: Wednesday, June 15, 2011 12:39 PM
To: 'Tim', pgsql-admin(at)postgresql(dot)org, 'Greg Williamson'
Subject: Re: [ADMIN] tsvector limitations
Greg Williamson wrote: > Try trolling texts at the Internet Archive (archive.org) -- lots > of stuff that has been rendered into ASCII ... Government > documents and the like from all periods; novels and the like that > are no longer under copyright, so lots of long classics. > > > > for example ... 765K Thanks. OK, for perspactive, A Tale of Two Cities has a tsvector size of 121KB. -Kevin -- Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2011-06-15 18:31:28 | Re: tsvector limitations |
| Previous Message | Kevin Grittner | 2011-06-15 16:39:06 | Re: tsvector limitations |