Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-11 21:24:05
Message-ID: 20210311212405.GG8796@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 11, 2021 at 10:07:30AM +0530, Dilip Kumar wrote:
> On Thu, Mar 11, 2021 at 8:50 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > Looking at v23-0002-alter-table-set-compression, ATRewriteTable() was calling
> > CompareCompressionMethodAndDecompress().
>
> While changing the compression method user might be just interested
> to compress the future tuple with the new compression method but
> doesn't want to rewrite all the old tuple. So IMHO without any option
> just force rewrite whenever changing the compression method doesn't look that
> great.

I mean to keep the current behavior where SET is only a catalog change.
But I'm comparing with earlier implementation.

Does your new patch avoid recompressing things if the compression is unchanged?

> > I think what's wanted here is that decompression should only happen when the
> > tuple uses a different compression than the column's currently set compression.
> > So there's no overhead in the usual case. I guess CLUSTER and INSERT SELECT
> > should do the same.
> >
> > This is important to allow someone to get rid of LZ4 compression, if they want
> > to get rid of that dependency.
> >
> > But it's also really desirable for admins to be able to "migrate" existing
> > data. People will want to test this, and I guess INSERT SELECT and CLUSTER are
> > the obvious ways.
>
> For INSERT SELECT we were already doing in the older version so we can
> include that code here, we will also have to include the patches for
> decompressing data before forming the composite types because without
> that we can not ensure that lz4 does not exist anywhere in the table.
> Said that with that also we can not ensure that it doesn't exist anywhere
> in the system because it might exist in the WAL and if you do the crash
> recovery then might get those lz4 compressed data back.

I think this is no concern except for PITR, in which case it's working as
intended (if someone does a partial replay to an intermediate state where
tables were still LZ4). If they replay to a point following the SET pglz +
CLUSTER, then LZ4 doesn't exist in the heap, and anything in the WAL is pretty
uninteresting.

My patch this morning compiled ok on mac, so you should include it.
|0005-f-2nd-attempt-to-use-pkgconfig-to-allow-compiling-on.patch
I mailed to Thomas offlist about getting pkg-config installed on the BSD CI
environment.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-11 21:59:00 Re: pg_amcheck contrib application
Previous Message Justin Pryzby 2021-03-11 21:10:06 Re: Parallel INSERT (INTO ... SELECT ...)