phraseto_tsquery design

From: Sagiv Some <sagivsome(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: phraseto_tsquery design
Date: 2018-06-21 15:02:32
Message-ID: CADcPEWBedG8jUG6yuSS4d8n7SsUM2fajH01Eprp27do1sAQEBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I recently migrated over from SQL Server to Postgres (through Amazon
Aurora) for the cost.

I have been working with the full-text search functions for quite some
time, but have encountered a few major roadblocks in my quest to use
postgres full-text search functionality to its fullest. I am using it as a
blob text-search tool, where the entire document (could be up to 100 pages)
is indexed and searched on:

1. Limited table and document sizes. In SQL server, I can create a 1.5 TB
table no problem (filingid, data blobs). However, in postgres, this is not
possible - I am limited to approximately 100-200GB per table and 1MB per
tsvector. This is an annoyance, but can get around it.

2. However, it seems impossible to bypass the performance problem of phrase
searching. I conduct quite a bit of phrase searching, and although
postgres' "phraseto_tsquery" performs great on phrases with uncommon words,
it slows to a screeching halt on phrases with common words such as "law
firm" or, for example, "bank of america". This is a huge problem, because
"plainto_tsquery" performs just fine on these but as I understand it,
phrase searching is built to do a scan after finding each word using
"plainto"?

There are already positions and the "plainto" function is quite fast; is
there a way to modify the "phraseto" query to perform a useful and fast
search that looks for the distance between found words appropriately?

Seems fairly trivial to modify the function this way, but I didn't see it
on the development roadmap. It's a pretty critical piece and I have seen
that other people have had issues with it too, and a test between this
function and some other similar functions show that the performance of the
phraseto function is quite poor.

Thanks,
Sagiv

sagivsome(at)gmail(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-06-21 15:09:27 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Robbie Harwood 2018-06-21 14:56:47 Re: libpq compression