| From: | Ron Johnson <ronljohnsonjr(at)gmail(dot)com> |
|---|---|
| To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: scaling up from t1n to 60 million records |
| Date: | 2026-05-19 14:41:42 |
| Message-ID: | CANzqJaA4S080-9tSOLZaWnfY6QbxZc8WQHv2JB4dOveWOkZh4g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Tue, May 19, 2026 at 10:27 AM Martin Mueller <
martinmueller(at)northwestern(dot)edu> wrote:
> I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet
> on steroids that analyzes rare or defective spellings in a corpus of 65,000
> texts and1.5 billion words. I typically extract data from the corpus with
> python scripts, turn them into tables and load them into the database.
>
>
> On my Mac with 32 GB of memory performance is OK with queries that
> typically within seconds extract data rows from tables with up to ten
> million rows. If the result set is large, I suspect that most of time
> machine's time is spent displaying result sets. I have used indexing
> sparingly. While it helps, the time savings often don't matter much.
>
>
> I am thinking about scaling up to table with about 60 million rows. Are
> there things to do or watch out for?
>
Use the correct tool for the task at hand, even if you are not a carpenter
and thus only know how to use a hammer.
Or should I proceed on the assumption that that 60 million records are
> within scope and that the added timecost is roughly linear?
>
In my experience, database performance shows a hockey stick graph: good
while stuff fits in memory, and then suddenly not so good.
The correct tool for full text search is PG's Full Text Search (ts_vector)
facility, paired with GIN indexes. Do you use them? Probably not, based
on your comments, but that would "keep 'everything' in memory", thus
staving off performance degradation.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Adrian Klaver | 2026-05-19 14:44:57 | Re: scaling up from t1n to 60 million records |
| Previous Message | Jan Karremans | 2026-05-19 14:32:39 | Re: scaling up from t1n to 60 million records |