From: | J2eeInside J2eeInside <j2eeinside(at)gmail(dot)com> |
---|---|
To: | Mike Rylander <mrylander(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Replacing Apache Solr with Postgre Full Text Search? |
Date: | 2020-03-26 08:03:08 |
Message-ID: | CAK-aFFbaE33n4t_wOdHGwAZYPacpo-v87w72tA5cZcdAdRCYkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Mike, and thanks for valuable answer!
In short, you think a PG Full Text Search can do the same as Apache Solr?
P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any
constraints in Ful Text search regarding those file types?
On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylander(at)gmail(dot)com> wrote:
> On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
> <j2eeinside(at)gmail(dot)com> wrote:
> >
> > Hi all,
> >
> > I hope someone can help/suggest:
> > I'm currently maintaining a project that uses Apache Solr /Lucene. To be
> honest, I wold like to replace Solr with Postgre Full Text Search. However,
> there is a huge amount of documents involved - arround 200GB. Wondering,
> can Postgre handle this efficiently?
> > Does anyone have specific experience, and what should the infrastructure
> look like?
> >
> > P.S. Not to be confused, the Sol works just fine, i just wanted to
> eliminate one component from the whole system (if Full text search can
> replace Solr at all)
>
> I'm one of the core developers (and the primary developer of the
> search subsystem) for the Evergreen ILS [1] (integrated library system
> -- think book library, not software library). We've been using PGs
> full-text indexing infrastructure since day one, and I can say it is
> definitely capable of handling pretty much anything you can throw at
> it.
>
> Our indexing requirements are very complex and need to be very
> configurable, and need to include a lot more than just "search and
> rank a text column," so we've had to build a ton of infrastructure
> around record (document) ingest, searching/filtering, linking, and
> display. If your indexing and search requirements are stable,
> specific, and well-understood it should be straight forward,
> especially if you don't have to take into account non-document
> attributes like physical location, availability, and arbitrary
> real-time visibility rules like Evergreen does.
>
> As for scale, it's more about document count than total size. There
> are Evergreen libraries with several million records to search, and
> with proper hardware and tuning everything works well. Our main
> performance issue has to do with all of the stuff outside the records
> (documents) themselves that have to be taken into account during
> search. The core full-text search part of our queries is extremely
> performant, and has only gotten better over the years.
>
> [1] http://evergreen-ils.org
>
> HTH,
> --
> Mike Rylander
> | Executive Director
> | Equinox Open Library Initiative
> | phone: 1-877-OPEN-ILS (673-6457)
> | email: miker(at)equinoxinitiative(dot)org
> | web: http://equinoxinitiative.org
>
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2020-03-26 10:28:11 | Re: PostgreSQL 13: native JavaScript Procedural Language support ? |
Previous Message | Ivan E. Panchenko | 2020-03-26 07:07:48 | Re: PostgreSQL 13: native JavaScript Procedural Language support ? |