Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-01-09 06:39:06
Message-ID: CAA4eK1L5PyRZMS0B8C+d_RCHo0VX6hu6D6tPnXnqPhy4tcNtFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 8, 2020 at 1:12 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > I have observed one more design issue.
> > >
> >
> > Good observation.
> >
> > > The problem is that when we
> > > get a toasted chunks we remember the changes in the memory(hash table)
> > > but don't stream until we get the actual change on the main table.
> > > Now, the problem is that we might get the change of the toasted table
> > > and the main table in different streams. So basically, in a stream,
> > > if we have only got the toasted tuples then even after
> > > ReorderBufferStreamTXN the memory usage will not be reduced.
> > >
> >
> > I think we can't split such changes in a different stream (unless we
> > design an entirely new solution to send partial changes of toast
> > data), so we need to send them together. We can keep a flag like
> > data_complete in ReorderBufferTxn and mark it complete only when we
> > are able to assemble the entire tuple. Now, whenever, we try to
> > stream the changes once we reach the memory threshold, we can check
> > whether the data_complete flag is true, if so, then only send the
> > changes, otherwise, we can pick the next largest transaction. I think
> > we can retry it for few times and if we get the incomplete data for
> > multiple transactions, then we can decide to spill the transaction or
> > maybe we can directly spill the first largest transaction which has
> > incomplete data.
> >
> Yeah, we might do something on this line. Basically, we need to mark
> the top-transaction as data-incomplete if any of its subtransaction is
> having data-incomplete (it will always be the latest sub-transaction
> of the top transaction). Also, for streaming, we are checking the
> largest top transaction whereas for spilling we just need the larget
> (sub) transaction. So we also need to decide while picking the
> largest top transaction for streaming, if we get a few transactions
> with in-complete data then how we will go for the spill. Do we spill
> all the sub-transactions under this top transaction or we will again
> find the larget (sub) transaction for spilling.
>

I think it is better to do later as that will lead to the spill of
only required (minimum changes to get the memory below threshold)
changes.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-01-09 06:43:03 Re: Add pg_file_sync() to adminpack
Previous Message Michael Paquier 2020-01-09 06:38:53 Re: Fixing parallel make of libpq