Re: UNDO and in-place update

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UNDO and in-place update
Date: 2017-01-13 18:06:58
Message-ID: CA+TgmoYVtcMLwGMe5mGpuh_zjJK8yD2KZSvw0wwywF_-e+Bp0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2017 at 5:57 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Sure, we can do that way and I agree that it is worth considering.
> Few cases where it can be really costly is when undo chain overflows
> to multiple pages and those pages doesn't exist in cache. I think the
> reason why it is probable is that undo chain is not maintained for row
> update, rather it is maintained for a page level changes for each
> transaction. So, what this means is that if we have to traverse the
> chain for record X, then I think it is quite possible that during
> chain traversal, we will encounter undo of record Y, ofcourse we can
> identify and ignore it, but certainly the chain can be much longer.
> Another kind of cost could be construction of actual record from undo
> record, for example if we consider to undo log only changed parts of
> Update as we do for WAL, then we have to incur some record
> construction cost as well. I think what can make it really costly is
> repeating this work. Even, if we consider the above as a worse case,
> I am not sure whether it will be cheap for short transactions like
> pgbench as there could be cases where concurrent updates/reads needs
> to traverse undo chains.
>
> Having said that, if you don't feel strongly about caching the
> information in such a way that it can be reused by different sessions,
> then we can initially implement the system as outlined by you, then we
> can extend it based on the performance characteristics.

Sounds fine. Also, one idea I've been considering is the idea of
allowing each shared buffer to have some associated "scratch space"
which would act as a cache that lasts as long as that page doesn't get
evicted. I think that might have some applications for hash indexes
as well. Of course deciding how much scratch space to allocate and
whether you wouldn't be better off using that memory some other way is
tricky, but it's a thought.

>> I don't see a reason to do that. I think one of the advantages of
>> storing the UNDO pages in the same pool as shared_buffers is that the
>> amount of UNDO that we store in there can float up and down according
>> to demand. If the heap pages are being accessed more frequently than
>> the UNDO pages then let the UNDO pages get evicted. If the reverse is
>> the case, that's OK too.
>
> Okay, not a problem. I was thinking to priortise UNDO buffers, so that
> they can only be evicted during checkpoint. However, I think there
> might not be much value in doing so.

Seems like we can leave the decision to later. If that turns out to
be the thing that is hurting performance, we can fix it then.

>>> Vacuum can delete or make the undo file reusable when the
>>> corresponding transaction precedes RecentGlobalXmin.
>>
>> If the associated transaction commits, yes. If it aborts, then we can
>> recycle it once undo is complete. But I think this should be the job
>> of a dedicated process which also executes undo actions; I don't think
>> it should be associated with vacuum.
>
> Agreed that it will be good to keep a separate undo process. However,
> I am not exactly clear about your point of executing undo actions in
> background. Do you mean to say that the main backend will mark
> transaction as aborted in clog, but undo actions will be done in
> backend or you have something else in mind?

I think it's not good to have the main backend perform the UNDO
actions because they might ERROR. That's not something we can support
when we're already in the abort path. If it happens in the background
process, it can catch the ERROR and return to the toplevel and retry
after a pause or do whatever needs to be done. For example, imagine
an I/O error.

> Okay, it seems we can deduce it from trasnction status. If the
> transaction is aborted, then we know undo log is invalid. If it is
> in-progress, then there will be a valid undo log. If it is committed,
> and all-visble (precedes RecentGlobalXmin), then the undo log will be
> invalid.

For MVCC purposes, that might work, but for other purposes, maybe not.
We actually need to replay the undo log on abort before removing it.

> By create, it seems you are thinking about having undo files as some
> in-memory thing (space reserved in memory) and if required, then only
> create the file?

Yes.

> If so, then the point we have discussed above for
> giving equal priority to undo pages in shared_buffers as heap pages is
> relevant, because backend evicting undo page could be costly as
> compare to heap page.

It's possible, but I don't think that's a key part of the design so
I'd say skip it for now and we'll deal with it if it becomes an issue.

> Won't it be efficient if we can do the same mapping between undo
> byte-offset and undo file as we do for WAL ((WALWrite offset to file).
> We can keep file size bigger than what we have for each WAL Segment?
> Do we need something more?

Well you need something more if you want to combine multiple undo logs
in a single file. If each file gets its own undo log, then that's all
you need. I think.

> Do we need epoch,xid in back-pointer for undo or only byte-offset will do?

I think just byte-offset.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Rusinov 2017-01-13 18:10:00 Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal
Previous Message Robert Haas 2017-01-13 17:36:53 Re: Parallel Index Scans