Re: AS OF queries

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AS OF queries
Date: 2017-12-27 08:21:12
Message-ID: 260da2ba-646c-f705-b142-3dce39f56329@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.12.2017 10:29, Craig Ringer wrote:
> On 25 December 2017 at 15:59, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
>
>
> On 25.12.2017 06:26, Craig Ringer wrote:
>> On 24 December 2017 at 04:53, konstantin knizhnik
>> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>>
>>
>>
>> But what if I just forbid to change recent_global_xmin?
>> If it is stalled at FirstNormalTransactionId and never changed?
>> Will it protect all versions from been deleted?
>>
>>
>> That's totally impractical, you'd have unbounded bloat and a
>> nonfunctional system in no time.
>>
>> You'd need a mechanism - akin to what we have with replication
>> slots - to set a threshold for age.
>
> Well, there are systems with "never delete" and "append only"
> semantic.
> For example, I have participated in SciDB project: database for
> scientific applications.
> One of the key requirements for scientific researches is
> reproducibility.
> From the database point of view it means that we need to store all
> raw data and never delete it.
>
>
> PostgreSQL can't cope with that for more than 2^31 xacts, you have to
> "forget" details of which xacts created/updated tuples and the
> contents of deleted tuples, or you exceed our xid limit. You'd need
> 64-bit XIDs, or a redo-buffer based heap model (like the zheap stuff)
> with redo buffers marked with an xid epoch, or something like that.

Yes, but PgPro-EE already has 64-bit xids and we have spent a lot of
time trying to push it to community.

> I am not sure that it should be similar with logical replication slot.
>
> Here semantic is quite clear: we preserve segments of WAL until
> them are replicated to the subscribers.
>
>
> Er, what?
>
> This isn't to do with restart_lsn. That's why I mentioned *logical*
> replication slots.
>
> I'm talking about how they interact with GetOldestXmin using their
> xmin and catalog_xmin.
>
> You probably won't want to re-use slots, but you'll want something
> akin to that, a transaction age threshold. Otherwise your system has a
> finite end date where it can no longer function due to xid count, or
> if you solve that, it'll slowly choke on table bloat etc. I guess if
> you're willing to accept truly horrible performance...

Definitely supporting time travel through frequently updated data may
cause database bloat and awful performance.
I still think that this feature will be mostly interesting for
append-only/rarely updated data.

In any case I have set vacuum_defer_cleanup_age = 1000000 and run
pgbench during several limits.
There was no significant performance degradation.

Unfortunately  replication slots, neither  vacuum_defer_cleanup_age
allows to keep versions just for particular table(s).
And it seems to be the major problem I do not know how to solve now.

> With time travel situation is less obscure: we may specify some
> threshold for age - keep data for example for one year.
>
>
> Sure. You'd likely do that by mapping commit timestamps => xids and
> using an xid threshold though.
>
> But unfortunately trick with snapshot (doesn't matter how we setup
> oldest xmin horizon) affect all tables.
>
>
> You'd need to be able to pass more info into HeapTupleSatisfiesMVCC
> etc. I expect you'd probably add a new snapshot type (like logical
> decoding did with historic snapshots), that has a new Satisfies
> function. But you'd have to be able to ensure all snapshot Satisfies
> callers had the required extra info - like maybe a Relation - which
> could be awkward for some call sites.
>

Yes, it seems to be the only possible choice.

> The user would have to be responsible for ensuring sanity of FK
> relationships etc when specifying different snapshots for different
> relations.
>
> Per-relation time travel doesn't seem totally impractical so long as
> you can guarantee that there is some possible snapshot for which the
> catalogs defining all the relations and types are simultaneously
> valid, i.e. there's no disjoint set of catalog changes. Avoiding messy
> performance implications with normal queries might not even be too bad
> if you use a separate snapshot model, so long as you can avoid
> callsites having to do extra work in the normal case.
>
> Dealing with dropped columns and rewrites would be a pain though.
> You'd have to preserve the dropped column data when you re-projected
> the rewrite tuples.
>
> There is similar (but not the same) problem with logical
> replication: assume that we need to replicate only one small
> table. But we have to pin in WAL all updates of other huge table
> which is not involved in logical replication at all.
>
>
> I don't really see how that's similar. It's concerned with WAL, wheras
> what you're looking at is heaps and bloat from old versions.
> Completely different, unless you propose to somehow reconstruct data
> from old WAL to do historic queries, which would be o_O ...
>
> Well, I am really not sure about user's demands to time travel.
> This is one of the reasons of initiating this discussion in
> hackers... May be it is not the best place for such discussion,
> because there are mostly Postgres developers and not users...
> At least, from experience of few SciDB customers, I can tell that
> we didn't have problems with schema evolution: mostly schema is
> simple, static and well defined.
> There was problems with incorrect import of data (this is why we
> have to add real delete), with splitting data in chunks
> (partitioning),...
>
>
> Every system I've ever worked with that has a "static" schema has
> landed up not being so static after all.
>
> I'm sure there are exceptions, but if you can't cope with catalog
> changes you've excluded the immense majority of users. Even the ones
> who promise they don't ever need to change anything ... land up
> changing things.

JSON? :)

>> The question is how we should handle such catalog changes if
>> them are happen. Ideally we should not allow to move back
>> beyond  this point.
>> Unfortunately it is not so easy to implement.
>>
>>
>> I think you can learn a lot from studying logical decoding here.
>>
>>
> Working with multimaster and shardman I have to learn a lot about
> logical replication.
> It is really powerful and flexible mechanism ... with a lot of
> limitations and problems: lack of catalog replication, inefficient
> bulk insert, various race conditions,...
> But I think that time travel and logical replication are really
> serving different goals so require different approaches.
>
>
> Of course. I'm pointing out that we solve the catalog-change problem
> using historic snapshots, and THAT is what you'd be wanting to look
> at. Also what it does with the rewrite map.
>
> However, you'd have a nightmare of a time getting the syscache to
> deliver you different data depending on which table's catalogs you're
> looking for. And what if there's some UDT that appears in >1 table
> with different AS OF times, but with different definitions at
> different times? Yuck.
>
> More importantly you can't construct a historic snapshot at some
> arbitrary point in time. It depends on the maintenance of state that's
> done with logical decoding and xlogreader. So I don't know how you'd
> construct a historic snapshot for "June 24 at 2:01 am".
>
> Ignoring concerns with catalog changes sounds convenient but in
> practice it's a total waste of time IMO. If nothing else there's temp
> tables to deal with.
>

Assume we have query

select * from A as old_a as of timestamp '2016-12-01', A as new_a as of
timestamp '2017-12-01' where old_a.old_id = new_a.new_id;

where schema of A was changed during this year. We have to carefully
specify proper historical snapshots in all places of parse and optimizer
deadling with this tables...
I afraid that it will be too complicated.

> --
>  Craig Ringer http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-12-27 08:34:37 Re: Postgres with pthread
Previous Message Michael Paquier 2017-12-27 07:53:42 Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)