Re: Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: [HACKERS] Custom compression methods
Date: 2020-09-28 10:48:07
Message-ID: CAFiTN-t+0y5xnPx+sSvverfXPk9E6OMUkdfgMg7mcTTbs8rokQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 19, 2020 at 1:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Aug 25, 2020 at 11:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Mon, Aug 24, 2020 at 2:12 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > IIUC, the main reason for using this flag is for taking the decision
> > > whether we need any detoasting for this tuple. For example, if we are
> > > rewriting the table because the compression method is changed then if
> > > HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple
> > > length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to
> > > call heap_toast_insert_or_update function for this tuple. Whereas if
> > > this flag is set then we need to because we might need to uncompress
> > > and compress back using a different compression method. The same is
> > > the case with INSERT into SELECT * FROM.
> >
> > This doesn't really seem worth it to me. I don't see how we can
> > justify burning an on-disk bit just to save a little bit of overhead
> > during a rare maintenance operation. If there's a performance problem
> > here we need to look for another way of mitigating it. Slowing CLUSTER
> > and/or VACUUM FULL down by a large amount for this feature would be
> > unacceptable, but is that really a problem? And if so, can we solve it
> > without requiring this bit?
>
> Okay, if we want to avoid keeping the bit then there are multiple ways
> to handle this, but the only thing is none of that will be specific to
> those scenarios.
> approach1. In ExecModifyTable, we can process the source tuple and see
> if any of the varlena attributes is compressed and its stored
> compression method is not the same as the target table attribute then
> we can decompress it.
> approach2. In heap_prepare_insert, always call the
> heap_toast_insert_or_update, therein we can check if any of the source
> tuple attributes are compressed with different compression methods
> then the target table then we can decompress it.
>
> With either of the approach, we have to do this in a generic path
> because the source of the tuple is not known, I mean it can be a
> output from a function, or the join tuple or a subquery. So in the
> attached patch, I have implemented with approach1.
>
> For testing, I have implemented using approach1 as well as using
> approach2 and I have checked the performance of the pg_bench to see
> whether it impacts the performance of the generic paths or not, but I
> did not see any impact.
>
> >
> > > I have already extracted these 2 patches from the main patch set.
> > > But, in these patches, I am still storing the am_oid in the toast
> > > header. I am not sure can we get rid of that at least for these 2
> > > patches? But, then wherever we try to uncompress the tuple we need to
> > > know the tuple descriptor to get the am_oid but I think that is not
> > > possible in all the cases. Am I missing something here?
> >
> > I think we should instead use the high bits of the toast size word for
> > patches #1-#4, as discussed upthread.
> >
> > > > > Patch #3. Add support for changing the compression method associated
> > > > > with a column, forcing a table rewrite.
> > > > >
> > > > > Patch #4. Add support for PRESERVE, so that you can change the
> > > > > compression method associated with a column without forcing a table
> > > > > rewrite, by including the old method in the PRESERVE list, or with a
> > > > > rewrite, by not including it in the PRESERVE list.
> > >
> > > Does this make sense to have Patch #3 and Patch #4, without having
> > > Patch #5? I mean why do we need to support rewrite or preserve unless
> > > we have the customer compression methods right? because the build-in
> > > compression method can not be dropped so why do we need to preserve?
> >
> > I think that patch #3 makes sense because somebody might have a table
> > that is currently compressed with pglz and they want to switch to lz4,
> > and I think patch #4 also makes sense because they might want to start
> > using lz4 for future data but not force a rewrite to get rid of all
> > the pglz data they've already got. Those options are valuable as soon
> > as there is more than one possible compression algorithm, even if
> > they're all built in. Now, as I said upthread, it's also true that you
> > could do #5 before #3 and #4. I don't think that's insane. But I
> > prefer it in the other order, because I think having #5 without #3 and
> > #4 wouldn't be too much fun for users.
>
> Details of the attached patch set
>
> 0001: This provides syntax to set the compression method from the
> built-in compression method (pglz or zlib). pg_attribute stores the
> compression method (char) and there are conversion functions to
> convert that compression method to the built-in compression array
> index. As discussed up thread the first 2 bits will be storing the
> compression method index using that we can directly get the handler
> routing using the built-in compression method array.
>
> 0002: This patch provides an option to changes the compression method
> for an existing column and it will rewrite the table.
>
> Next, I will be working on providing an option to alter the
> compression method without rewriting the whole table, basically, we
> can provide a preserve list to preserve old compression methods.

I have rebased the patch and I have also done a couple of defect fixes
and some cleanup.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-0002-alter-table-set-compression.patch application/octet-stream 9.9 KB
v3-0001-Built-in-compression-method.patch application/octet-stream 186.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-09-28 12:21:24 Re: Partition prune with stable Expr
Previous Message Bharath Rupireddy 2020-09-28 10:36:04 Re: Parallel INSERT (INTO ... SELECT ...)