Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-01-22 05:00:25
Message-ID: CAFiTN-s6ADLx8fyRYBbKtY08-eDPaBV+gtKQbn0jm1Uw2gC_oQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2020 at 10:44 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jan 13, 2020 at 3:18 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > > The problem is that when we
> > > > > > get a toasted chunks we remember the changes in the memory(hash table)
> > > > > > but don't stream until we get the actual change on the main table.
> > > > > > Now, the problem is that we might get the change of the toasted table
> > > > > > and the main table in different streams. So basically, in a stream,
> > > > > > if we have only got the toasted tuples then even after
> > > > > > ReorderBufferStreamTXN the memory usage will not be reduced.
> > > > > >
> > > > >
> > > > > I think we can't split such changes in a different stream (unless we
> > > > > design an entirely new solution to send partial changes of toast
> > > > > data), so we need to send them together. We can keep a flag like
> > > > > data_complete in ReorderBufferTxn and mark it complete only when we
> > > > > are able to assemble the entire tuple. Now, whenever, we try to
> > > > > stream the changes once we reach the memory threshold, we can check
> > > > > whether the data_complete flag is true
>
> Here, we can also consider streaming the changes when data_complete is
> false, but some additional changes have been added to the same txn as
> the new changes might complete the tuple.
>
> > > > > , if so, then only send the
> > > > > changes, otherwise, we can pick the next largest transaction. I think
> > > > > we can retry it for few times and if we get the incomplete data for
> > > > > multiple transactions, then we can decide to spill the transaction or
> > > > > maybe we can directly spill the first largest transaction which has
> > > > > incomplete data.
> > > > >
> > > > Yeah, we might do something on this line. Basically, we need to mark
> > > > the top-transaction as data-incomplete if any of its subtransaction is
> > > > having data-incomplete (it will always be the latest sub-transaction
> > > > of the top transaction). Also, for streaming, we are checking the
> > > > largest top transaction whereas for spilling we just need the larget
> > > > (sub) transaction. So we also need to decide while picking the
> > > > largest top transaction for streaming, if we get a few transactions
> > > > with in-complete data then how we will go for the spill. Do we spill
> > > > all the sub-transactions under this top transaction or we will again
> > > > find the larget (sub) transaction for spilling.
> > > >
> > >
> > > I think it is better to do later as that will lead to the spill of
> > > only required (minimum changes to get the memory below threshold)
> > > changes.
> > I think instead of doing this can't we just spill the changes which
> > are in toast_hash. Basically, at the end of the stream, we have some
> > toast tuple which we could not stream because we did not have the
> > insert for the main table then we can spill only those changes which
> > are in tuple hash.
> >
>
> Hmm, I think this can turn out to be inefficient because we can easily
> end up spilling the data even when we don't need to so. Consider
> cases, where part of the streamed changes are for toast, and remaining
> are the changes which we would have streamed and hence can be removed.
> In such cases, we could have easily consumed remaining changes for
> toast without spilling. Also, I am not sure if spilling changes from
> the hash table is a good idea as they are no more in the same order as
> they were in ReorderBuffer which means the order in which we serialize
> the changes normally would change and that might have some impact, so
> we would need some more study if we want to pursue this idea.
I have fixed this bug and attached it as a separate patch. I will
merge it to the main patch after we agree with the idea and after some
more testing.

The idea is that whenever we get the toasted chunk instead of directly
inserting it into the toast hash I am inserting it into some local
list so that if we don't get the change for the main table then we can
insert these changes back to the txn->changes. So once we get the
change for the main table at that time I am preparing the hash table
to merge the chunks. If the stream is over and we haven't got the
changes for the main table, that time we will mark the txn that it has
some pending toast changes so that next time we will not pick the same
transaction for the streaming. This flag will be cleaned whenever we
get any changes for the txn (insert or /update). There is also a
possibility that even after we stream the changes the rb->size is not
below logical_decoding_work_mem because we could not stream the
changes so for handling this after streaming we recheck the size and
if it is still not under control then we pick another transaction. In
some cases, we might not get any transaction to stream because the
transaction has the pending toast change flag set, In this case, we
will go for the spill.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v8-0001-Immediately-WAL-log-assignments.patch application/octet-stream 10.3 KB
v8-0003-Extend-the-output-plugin-API-with-stream-methods.patch application/octet-stream 34.8 KB
v8-0002-Issue-individual-invalidations-with-wal_level-log.patch application/octet-stream 16.4 KB
v8-0004-Gracefully-handle-concurrent-aborts-of-uncommitte.patch application/octet-stream 13.1 KB
v8-0005-Implement-streaming-mode-in-ReorderBuffer.patch application/octet-stream 37.2 KB
v8-0006-Fix-speculative-insert-bug.patch application/octet-stream 2.5 KB
v8-0007-Support-logical_decoding_work_mem-set-from-create.patch application/octet-stream 13.1 KB
v8-0009-Track-statistics-for-streaming.patch application/octet-stream 11.7 KB
v8-0010-Enable-streaming-for-all-subscription-TAP-tests.patch application/octet-stream 14.7 KB
v8-0008-Add-support-for-streaming-to-built-in-replication.patch application/octet-stream 91.1 KB
v8-0011-BUGFIX-set-final_lsn-for-subxacts-before-cleanup.patch application/octet-stream 1013 bytes
v8-0012-Add-TAP-test-for-streaming-vs.-DDL.patch application/octet-stream 4.4 KB
v8-0013-Bugfix-handling-of-incomplete-toast-tuple.patch application/octet-stream 10.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-22 05:01:51 Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM
Previous Message Michael Paquier 2020-01-22 04:53:32 Re: doc: alter table references bogus table-specific planner parameters