Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-12-01 15:45:44
Message-ID: CAFiTN-sYceWDFqK_Pws11J1FB9p4VGwBHC8ZeEtEhPbCWFij=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2020 at 4:50 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> While working on this comment I have doubts.
>
> > I wonder in passing about TOAST tables and materialized views, which
> > are the other things that have storage. What gets stored for
> > attcompression? For a TOAST table it probably doesn't matter much
> > since TOAST table entries shouldn't ever be toasted themselves, so
> > anything that doesn't crash is fine (but maybe we should test that
> > trying to alter the compression properties of a TOAST table doesn't
> > crash, for example).
>
> Yeah for the toast table it doesn't matter, but I am not sure what do
> you mean by altering the compression method for the toast table. Do you
> mean manually update the pg_attribute tuple for the toast table and
> set different compression methods? Or there is some direct way to
> alter the toast table?
>
> For a materialized view it seems reasonable to
> > want to set column properties, but I'm not quite sure how that works
> > today for things like STORAGE anyway. If we do allow setting STORAGE
> > or COMPRESSION for materialized view columns then dump-and-reload
> > needs to preserve the values.
>
> I see that we allow setting the STORAGE for the materialized view but
> I am not sure what is the use case. Basically, the tuples are
> directly getting selected from the host table and inserted in the
> materialized view without checking target and source storage type.
> The behavior is the same if you execute INSERT INTO dest_table SELECT
> * FROM source_table. Basically, if the source_table attribute has
> extended storage and the target table has plain storage, still the
> value will be inserted directly into the target table without any
> conversion. However, in the table, you can insert the new tuple and
> that will be stored as per the new storage method so that is still
> fine but I don't know any use case for the materialized view. Now I am
> thinking what should be the behavior for the materialized view?
>
> For the materialized view can we have the same behavior as storage? I
> think for the built-in compression method that might not be a problem
> but for the external compression method how can we handle the
> dependency, I mean when the materialized view has created the table
> was having an external compression method "cm1" and we have created
> the materialized view based on that now if we alter table and set the
> new compression method and force table rewrite then what will happen
> to the tuple inside the materialized view, I mean tuple is still
> compressed with "cm1" and there is no attribute is maintaining the
> dependency on "cm1" because the materialized view can point to any
> compression method. Now if we drop the cm1 it will be allowed to
> drop. So I think for the compression method we can consider the
> materialized view same as the table, I mean we can allow setting the
> compression method for the materialized view and we can always ensure
> that all the tuple in this view is compressed with the current or the
> preserved compression methods. So whenever we are inserting in the
> materialized view then we should compare the datum compression method
> with the target compression method.
>
>
> > + /*
> > + * Use default compression method if the existing compression method is
> > + * invalid but the new storage type is non plain storage.
> > + */
> > + if (!OidIsValid(attrtuple->attcompression) &&
> > + (newstorage != TYPSTORAGE_PLAIN))
> > + attrtuple->attcompression = DefaultCompressionOid;
> >
> > You have a few too many parens in there.
> >
> > I don't see a particularly good reason to treat plain and external
> > differently.
>
> Yeah, I think they should be treated the same.
>
> More generally, I think there's a question here about
> > when we need an attribute to have a valid compression type and when we
> > don't. If typstorage is plan or external, then there's no point in
> > ever having a compression type and maybe we should even reject
> > attempts to set one (but I'm not sure).
>
> I agree.
>
> > However, the attstorage is a
> > different case. Suppose the column is created with extended storage
> > and then later it's changed to plain. That's only a hint, so there may
> > still be toasted values in that column, so the compression setting
> > must endure. At any rate, we need to make sure we have clear and
> > sensible rules for when attcompression (a) must be valid, (b) may be
> > valid, and (c) must be invalid. And those rules need to at least be
> > documented in the comments, and maybe in the SGML docs.
>
> IIUC, even if we change the attstorage the existing tuples are stored
> as it is without changing the tuple storage. So I think even if the
> attstorage is changed the attcompression should not have any change.
>

I have put some more thought into this and IMHO the rules should be as below

1. If attstorage is EXTENDED -> attcompression "must be valid"
2. if attstorage is PLAIN/EXTERNAL -> atttcompression "maybe valid"
3. if typstorage is PLAIN/EXTERNAL -> atttcompression "must be invalid"

I am a little bit confused about (2), basically, it will be valid in
the scenario u mentioned that change the atttstorege from EXTENDED to
PLAIN/EXTERNAL. But I think in this case also we can just set the
attcompression to invalid, however, we have to maintain the dependency
between attribute and compression method so that the old methods using
which we might have compressed a few tuples in the table doesn't get
dropped.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2020-12-01 16:03:35 Re: Confusing behavior of psql's \e
Previous Message Alvaro Herrera 2020-12-01 15:25:19 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY