Re: should check interrupts in BuildRelationExtStatistics ?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: should check interrupts in BuildRelationExtStatistics ?
Date: 2022-06-03 15:28:37
Message-ID: 20220603152837.GO29853@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 09, 2022 at 09:11:37AM -0400, Robert Haas wrote:
> On Sun, May 8, 2022 at 11:36 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > > How long can the backend remain unresponsive? I don't think that
> > > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in
> > > areas where it would be efficient to make the shutdown quicker, but
> > > we need to think carefully about the places where we'd want to add
> > > these.
> >
> > CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch.
> > I wouldn't put it in a *very* tight loop, but one test per row
> > processed while gathering stats is unlikely to be a problem.
>
> +1. If we're finding things stalling that would be fixed by adding
> CHECK_FOR_INTERRUPTS(), we should generally just add it. In the
> unlikely event that this causes a performance problem, we can try to
> figure out some other solution, but not responding to interrupts isn't
> the right way to economize.

Reproduce the problem for ndistinct and dependencies like:

DROP TABLE t; CREATE TABLE t AS SELECT i A,1+i B,2+i C,3+i D,4+i E,5+i F, now() AS ts FROM generate_series(1.0, 99999.0)i; VACUUM t;
DROP STATISTICS stxx; CREATE STATISTICS stxx (ndistinct) ON mod(a,14),mod(b,15),mod(c,16),mod(d,17),mod(e,18),mod(f,19) FROM t;
ANALYZE VERBOSE t;

Maybe this should actually call vacuum_delay_point(), like
compute_scalar_stats(). For MCV, there seems to be no issue, since those
functions are being called (but only for expressional stats). But maybe I've
just failed to make a large enough, non-expressional MCV list for the problem
to be apparent.

The patch is WIP, but whatever we end up with should probably be backpatched at
least to v14, where expressional indexes were introduced, since they're likely
to have more columns, and are slower to compute.

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index e8f71567b4e..e5538dcd4e1 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -19,6 +19,7 @@
#include "catalog/pg_statistic_ext.h"
#include "catalog/pg_statistic_ext_data.h"
#include "lib/stringinfo.h"
+#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "nodes/nodes.h"
#include "nodes/pathnodes.h"
@@ -383,6 +384,8 @@ statext_dependencies_build(StatsBuildData *data)
MVDependency *d;
MemoryContext oldcxt;

+ CHECK_FOR_INTERRUPTS();
+
/* release memory used by dependency degree calculation */
oldcxt = MemoryContextSwitchTo(cxt);

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index dd67b19b6fa..9db1d0325cd 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -269,6 +269,8 @@ statext_mcv_build(StatsBuildData *data, double totalrows, int stattarget)
SortItem **freqs;
int *nfreqs;

+ // CHECK_FOR_INTERRUPTS();
+
/* used to search values */
tmp = (MultiSortSupport) palloc(offsetof(MultiSortSupportData, ssup)
+ sizeof(SortSupportData));
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 6ade5eff78c..3b739ab7ca0 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -29,6 +29,7 @@
#include "catalog/pg_statistic_ext.h"
#include "catalog/pg_statistic_ext_data.h"
#include "lib/stringinfo.h"
+#include "miscadmin.h"
#include "statistics/extended_stats_internal.h"
#include "statistics/statistics.h"
#include "utils/fmgrprotos.h"
@@ -114,6 +115,8 @@ statext_ndistinct_build(double totalrows, StatsBuildData *data)
MVNDistinctItem *item = &result->items[itemcnt];
int j;

+ CHECK_FOR_INTERRUPTS();
+
item->attributes = palloc(sizeof(AttrNumber) * k);
item->nattributes = k;

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-06-03 15:37:48 Re: [PATCH] fix doc example of bit-reversed MAC address
Previous Message Nathan Bossart 2022-06-03 15:23:19 Re: Proposal: adding a better description in psql command about large objects