Re: Tracking last scan time

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tracking last scan time
Date: 2022-08-31 16:21:08
Message-ID: 20220831162108.qyarvx7edqnobiye@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-23 10:55:09 +0100, Dave Page wrote:
> Often it is beneficial to review one's schema with a view to removing
> indexes (and sometimes tables) that are no longer required. It's very
> difficult to understand when that is the case by looking at the number of
> scans of a relation as, for example, an index may be used infrequently but
> may be critical in those times when it is used.
>
> The attached patch against HEAD adds optional tracking of the last scan
> time for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.
>
> Due to the use of gettimeofday(), those values are only maintained if a new
> GUC, track_scans, is set to on. By default, it is off.
>
> I did run a 12 hour test to see what the performance impact is. pgbench was
> run with scale factor 10000 and 75 users across 4 identical bare metal
> machines running Rocky 8 in parallel which showed roughly a -2% average
> performance penalty against HEAD with track_scans enabled. Machines were
> PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data
> directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source
> is tsc.
>
> HEAD track_scans Penalty (%)
> box1 19582.49735 19341.8881 -1.22869541
> box2 19936.55513 19928.07479 -0.04253664659
> box3 19631.78895 18649.64379 -5.002830696
> box4 19810.86767 19420.67192 -1.969604525
> Average 19740.42728 19335.06965 -2.05343896

Based on the size of those numbers this was a r/w pgbench. If it has this
noticable an impact for r/w, with a pretty low number of scans/sec, how's the
overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It
must be quite bad.

I don't think we should accept this feature with this overhead - but I also
think we can do better, by accepting a bit less accuracy. For this to be
useful we don't need a perfectly accurate timestamp. The statement start time
is probably not accurate enough, but we could just have bgwriter or such
update one in shared memory every time we wake up? Or perhaps we could go to
an even lower granularity, by putting in the current LSN or such?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-08-31 16:26:29 Re: SQL/JSON features for v15
Previous Message Bruce Momjian 2022-08-31 16:13:37 Re: Tracking last scan time