Re: Less than ideal error reporting in pg_stat_statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Less than ideal error reporting in pg_stat_statements
Date: 2015-10-04 16:27:15
Message-ID: 32385.1443976035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> Attached, revised patch deals with the issues around removing the
> query text file when garbage collection encounters trouble. There is
> no reason to be optimistic about any error within gc_qtexts() not
> recurring during a future garbage collection. OOM might be an
> exception, but probably not, since gc_qtexts() is reached when a new
> entry is created with a new query text, which in general makes OOM
> progressively more likely.

Hm. I'm unconvinced by the aspects of this that involve using
mean_query_len as a filter on which texts will be accepted. While that's
not necessarily bad in the abstract, there are way too many implementation
artifacts here that will result in random-seeming decisions about whether
to normalize. For instance:

* mean_query_len only gets recomputed in entry_dealloc(), which is only
run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide
the query texts file is more than 50% bloat. So there could be quite a
long startup transient before the value gets off its initialization
minimum, and I'm suspicious that there might be plausible use-cases where
it never does. So it's not so much "restrict to a multiple of the mean
query len" as "restrict to some number that might once upon a time have
had some relation to the mean query len, or maybe not".

* One could expect that after changing mean_query_len, the population of
query texts would change character as a result of the filter behavior
changing, so that convergence to stable behavior over the long haul is
not exactly self-evident.

* As you've got it here, entry_dealloc() and gc_qtexts() don't compute
mean_query_len the same way, because only one of them discriminates
against sticky entries. So the value would bounce about rather randomly
based on which one had run last.

* I'm not exactly convinced that sticky entries should be ignored for
this purpose anyway.

Taking a step back, ISTM the real issue you're fighting here is lots of
orphaned sticky entries, but the patch doesn't do anything directly to fix
that problem. I wonder if we should do something like making
entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries,
or at least those with "large" texts.

I think the aspects of this patch that are reasonably uncontroversial are
increasing the allowed malloc attempt size in gc_qtexts, flushing the
query text file on malloc failure, fixing the missing cleanup steps after
a gc failure, and making entry_dealloc's recomputation of mean_query_len
sane (which I'll define for the moment as "the same as gc_qtexts would
get"). Since we're hard against a release deadline, I propose to commit
just those changes, and we can consider the idea of a query size filter
and/or redefining mean_query_len at leisure.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-10-04 16:36:19 Re: Odd query execution behavior with extended protocol
Previous Message Pavel Stehule 2015-10-04 16:02:19 Re: check fails on Fedora 23