Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18

From: apurba saha <aksaha37(at)yahoo(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18
Date: 2026-06-03 14:48:19
Message-ID: 1960411813.727800.1780498099726@mail.yahoo.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear all,
Very good morning please.I have some big texts in my tables. On average, each row contains about 4.2KB data and there are 9.5 million rows.I want to perform various conceptual searches on technical terms, technical phrases and would like to retrieve all texts with nearest meanings.  So I have to vectorize the data.What is the best approach please?
I was trying to fragment the data into small fragments of 4.2 KB & then do embedding using small vector size with the help of pgvector.Once I have the embedding vectors on fragments, then I can combine them using some close relationship model or average.
This way, we generate embedding for the full text.
Or would you recommend any other approach to generate embedding for the full text please?
Also I have another question. I have title, abstract & description where description is about 3KB and I would like to search title, abstract, description. Should I merge all the data (& generate embeddings) or keep the embeddings separate?
Have a wonderful day please.Thank you,Apurba K. Saha

Browse pgsql-general by date

  From Date Subject
Next Message Justin Pryzby 2026-06-04 17:14:34 Re: analyze-in-stages post upgrade questions
Previous Message Christoph Berg 2026-05-27 09:15:07 Re: Bitnami deprecation