BUG #14654: With high statistics targets on ts_vector, unexpectedly high memory use & OOM are triggered

From: james+postgres(at)carbocation(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14654: With high statistics targets on ts_vector, unexpectedly high memory use & OOM are triggered
Date: 2017-05-14 20:06:02
Message-ID: 20170514200602.1451.46797@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14654
Logged by: James C
Email address: james+postgres(at)carbocation(dot)com
PostgreSQL version: 9.5.5
Operating system: Linux
Description:

I am running a mirror of PubMed, which contains about 27 million articles.
One of my tables (article) includes a unique ID for each article, the
article title, a ts_vector of the abstract, some timestamps, and a numeric
column. There are (unsurprisingly) about 27 million rows in this table.
Because the abstracts contain scientific terms, the number of lexemes is
quite large.

With some prototyping on a local machine with 64G of RAM, I found queries to
work quite well with the the ts_vector column having default statistics of
10,000. The production box has about 12G of RAM, but initially I didn't
notice any issues.

Today, I attempted to execute 'ANALYZE article;' However, the OOM killer
kept taking down postgres. I reviewed numerous system settings and postgres
settings, all of which seemed to properly give about 10G+ of RAM to
postgres, and which should have constrained postgres to using less than that
limit.

RhodiumToad in Freenode#postgres helped me troubleshoot, and suggested I try
reducing the statistics on the ts_vector. Dropping from 10,000 to 1,000
permitted ANALYZE to run without consuming large amounts of RAM.

It seems that ANALYZE on a ts_vector column can consume 300 * (statistics
target) * (size of data in field), which in my case ended up being well
above 10 gigabytes. I wonder if this might be considered a bug (either in
code, or of documentation), as this memory usage seems not to obey other
limits, or at least wasn't documented in a way that might have helped me
guess at the underlying problem.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2017-05-15 00:53:50 Re: BUG #14650: pg_dump -c fails when 'public' schema doesn't exist
Previous Message Marko Elezovic 2017-05-14 15:43:16 Commenting a FK crashes ALTER TABLE statements