Re: Add LSN <-> time conversion functionality

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Add LSN <-> time conversion functionality
Date: 2024-03-18 17:46:13
Message-ID: c2960586-2d21-4651-a997-61c1eaeb442a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 3/18/24 15:02, Daniel Gustafsson wrote:
>> On 22 Feb 2024, at 03:45, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>> On Fri, Feb 16, 2024 at 3:41 PM Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>>> - Not sure why we need 0001. Just so that the "estimate" functions in
>>> 0002 have a convenient "start" point? Surely we could look at the
>>> current LSNTimeline data and use the oldest value, or (if there's no
>>> data) use the current timestamp/LSN?
>>
>> When there are 0 or 1 entries in the timeline you'll get an answer
>> that could be very off if you just return the current timestamp or
>> LSN. I guess that is okay?
>
> I don't think that's a huge problem at such a young "lsn-age", but I might be
> missing something.
>
>>> - I wonder what happens if we lose the data - we know that if people
>>> reset statistics for whatever reason (or just lose them because of a
>>> crash, or because they're on a replica), bad things happen to
>>> autovacuum. What's the (expected) impact on pruning?
>>
>> This is an important question. Because stats aren't crashsafe, we
>> could return very inaccurate numbers for some time/LSN values if we
>> crash. I don't actually know what we could do about that. When I use
>> the LSNTimeline for the freeze heuristic it is less of an issue
>> because the freeze heuristic has a fallback strategy when there aren't
>> enough stats to do its calculations. But other users of this
>> LSNTimeline will simply get highly inaccurate results (I think?). Is
>> there anything we could do about this? It seems bad.
>

Do we have something to calculate a sufficiently good "average" to use
as a default, if we don't have a better value? For example, we know the
timestamp of the last checkpoint, and we know the LSN, right? Maybe if
we're sufficiently far from the checkpoint, we could use that.

Or maybe checkpoint_timeout / max_wal_size would be enough to calculate
some default value?

I wonder how long it takes until LSNTimeline gives us sufficiently good
data for all LSNs we need. That is, if we lose this, how long it takes
until we get enough data to do good decisions?

Why don't we simply WAL-log this in some trivial way? It's pretty small,
so if we WAL-log this once in a while (after a merge happens), that
should not be a problem.

Or a different idea - if we lost the data, but commit_ts is enabled,
can't we "simply" walk commit_ts and feed LSN/timestamp into the
timeline? I guess we don't want to walk 2B transactions, but even just
sampling some recent transactions might be enough, no?

> A complication with this over stats is that we can't recompute this in case of
> a crash/corruption issue. The simple solution is to consider this unlogged
> data and start fresh at every unclean shutdown, but I have a feeling that won't
> be good enough for basing heuristics on?
>
>> Andres had brought up something at some point about, what if the
>> database is simply turned off for awhile and then turned back on. Even
>> if you cleanly shut down, will there be "gaps" in the timeline? I
>> think that could be okay, but it is something to think about.
>
> The gaps would represent reality, so there is nothing wrong per se with gaps,
> but if they inflate the interval of a bucket which in turns impact the
> precision of the results then that can be a problem.
>

Well, I think the gaps are a problem in the sense that they disappear
once we start merging the buckets. But maybe that's fine, if we're only
interested in approximate data.

>> Just a note, all of my comments could use a lot of work, but I want to
>> get consensus on the algorithm before I make sure and write about it
>> in a perfect way.
>
> I'm not sure "a lot of work" is accurate, they seem pretty much there to me,
> but I think that an illustration of running through the algorithm in an
> ascii-art array would be helpful.
>

+1

>
> Reading through this I think such a function has merits, not only for your
> usecase but other heuristic based work and quite possibly systems debugging.
>
> While the bucketing algorithm is a clever algorithm for degrading precision for
> older entries without discarding them, I do fear that we'll risk ending up with
> answers like "somewhere between in the past and even further in the past".
> I've been playing around with various compression algorithms for packing the
> buckets such that we can retain precision for longer. Since you were aiming to
> work on other patches leading up to the freeze, let's pick this up again once
> things calm down.
>

I guess this ambiguity is pretty inherent to a structure that does not
keep all the data, and gradually reduces the resolution for old stuff.
But my understanding was that's sufficient for the freezing patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-18 17:53:50 Re: Popcount optimization using AVX512
Previous Message Maciek Sakrejda 2024-03-18 17:37:45 Re: Possibility to disable `ALTER SYSTEM`