Quick Links

Re: Replacing Apache Solr with Postgre Full Text Search?

From:	J2eeInside J2eeInside <j2eeinside(at)gmail(dot)com>
To:	Mike Rylander <mrylander(at)gmail(dot)com>
Cc:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Replacing Apache Solr with Postgre Full Text Search?
Date:	2020-03-26 08:03:08
Message-ID:	CAK-aFFbaE33n4t_wOdHGwAZYPacpo-v87w72tA5cZcdAdRCYkw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi Mike, and thanks for valuable answer!
In short, you think a PG Full Text Search can do the same as Apache Solr?

P.S. I need to index .pdf, .html and MS Word .doc/.docx files, is there any
constraints in Ful Text search regarding those file types?

On Wed, Mar 25, 2020 at 3:36 PM Mike Rylander <mrylander(at)gmail(dot)com> wrote:

> On Wed, Mar 25, 2020 at 8:37 AM J2eeInside J2eeInside
> <j2eeinside(at)gmail(dot)com> wrote:
> >
> > Hi all,
> >
> > I hope someone can help/suggest:
> > I'm currently maintaining a project that uses Apache Solr /Lucene. To be
> honest, I wold like to replace Solr with Postgre Full Text Search. However,
> there is a huge amount of documents involved - arround 200GB. Wondering,
> can Postgre handle this efficiently?
> > Does anyone have specific experience, and what should the infrastructure
> look like?
> >
> > P.S. Not to be confused, the Sol works just fine, i just wanted to
> eliminate one component from the whole system (if Full text search can
> replace Solr at all)
>
> I'm one of the core developers (and the primary developer of the
> search subsystem) for the Evergreen ILS [1] (integrated library system
> -- think book library, not software library). We've been using PGs
> full-text indexing infrastructure since day one, and I can say it is
> definitely capable of handling pretty much anything you can throw at
> it.
>
> Our indexing requirements are very complex and need to be very
> configurable, and need to include a lot more than just "search and
> rank a text column," so we've had to build a ton of infrastructure
> around record (document) ingest, searching/filtering, linking, and
> display. If your indexing and search requirements are stable,
> specific, and well-understood it should be straight forward,
> especially if you don't have to take into account non-document
> attributes like physical location, availability, and arbitrary
> real-time visibility rules like Evergreen does.
>
> As for scale, it's more about document count than total size. There
> are Evergreen libraries with several million records to search, and
> with proper hardware and tuning everything works well. Our main
> performance issue has to do with all of the stuff outside the records
> (documents) themselves that have to be taken into account during
> search. The core full-text search part of our queries is extremely
> performant, and has only gotten better over the years.
>
> [1] http://evergreen-ils.org
>
> HTH,
> --
> Mike Rylander
> | Executive Director
> | Equinox Open Library Initiative
> | phone: 1-877-OPEN-ILS (673-6457)
> | email: miker(at)equinoxinitiative(dot)org
> | web: http://equinoxinitiative.org
>

In response to

Re: Replacing Apache Solr with Postgre Full Text Search? at 2020-03-25 14:36:44 from Mike Rylander

Responses

Re: Replacing Apache Solr with Postgre Full Text Search? at 2020-03-26 15:18:16 from Mike Rylander

Browse pgsql-general by date

	From	Date	Subject
Next Message	Bruce Momjian	2020-03-26 10:28:11	Re: PostgreSQL 13: native JavaScript Procedural Language support ?
Previous Message	Ivan E. Panchenko	2020-03-26 07:07:48	Re: PostgreSQL 13: native JavaScript Procedural Language support ?