Re: Query planner instability on large append-only table with heavy same-day inserts

From: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To: Giriraj Sharma <giriraj(dot)sharma27(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner instability on large append-only table with heavy same-day inserts
Date: 2025-10-28 15:30:50
Message-ID: CAKAnmmLQGwqtD+3kn+sgAspV4dhtYYoE6DrE7uFEhn04ab6S3w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 28, 2025 at 2:36 AM Giriraj Sharma <giriraj(dot)sharma27(at)gmail(dot)com>
wrote:

> Would daily range partitioning by effective_date (≈ 1000 partitions for 3
> years) be the right long-term approach here?
>

If you are querying based on dates, yes. Does not need to be daily, could
do monthly for example. A retention policy of 3 years is a great candidate
for partitioning.

> Given that inserts and queries almost always target today’s partition,
> will partition pruning and per-partition statistics fully eliminate the
> stale-statistics problem? Are there known performance implications of
> maintaining ~1000 daily partitions at this scale (10 M/day)? We
> occasionally receive backdated or future-dated inserts — can such
> out-of-range values be handled efficiently (e.g., by creating partitions on
> the fly)?
>

It will help, don't know about eliminate. I would not do daily unless it is
really needed, that's a lot of partitions. Monthly to start with. You
cannot create partitions on the fly, but you can have them go to a default
partition and sort them out later. Or pre-create a bunch of partitions.

If we stay with a single table, what are practical per-table autovacuum
> settings to ensure frequent ANALYZE even as total row count grows into
> billions?
>

Turn off autovacuum for that table. Use cron to run vacuum and analyze
hourly (or some frequency that ensures good plans). If analyze is taking
too long, it can be done per-column as well, although that won't help much
if your sample table is representative. But if you have a lot of other
columns with stable values, you could analyze those less often. Measure and
see.

Should we also experiment with raising the statistics target for
> effective_date from 100 to 1000 using:
>

I don't think that will matter if your analyze is already giving you
index-only scans.

Are there best practices or proven patterns for append-only,
> time-series–like workloads that insert heavily into one day and read from
> the same day?
>

Partitioning. Ensure fillfactor is 100%. Minimal indexes.

Is there a known best way to make Postgres’s planner more resilient to
> temporary statistic drift for parameterized queries like ours?

Not really, other than giving Postgres updated stats via frequent analyze.

PostgreSQL 14 / 15 tested.

Moving to a newer version is always helpful too.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Welty 2025-10-28 16:26:01 Re: phadmin4 app not starting after most recent upgrade
Previous Message Adrian Klaver 2025-10-28 15:17:41 Re: Index (primary key) corrupt?