Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-20 05:34:20
Message-ID: CAFiTN-t=nNd6-ZyWP=5TQtqvWW0_VhdzQfcMsQYRdEa7eAX0gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 20, 2021 at 2:51 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Feb 19, 2021 at 11:12 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I think that these performance tests aren't really exercising the
> expanded-record stuff, just the ExecEvalRow changes. We need to test
> that test case, and I tend to suspect there's going to be a measurable
> regression.

I will do testing around this area.

> I spent some time looking at how datums get into the expanded record
> system. There seem to be four possible paths:
> expanded_record_set_tuple(), make_expanded_record_from_datum(),
> expanded_record_set_field_internal(), and
> expanded_record_set_fields(). The first two of these inject an entire
> tuple, while the latter two work on a field-by-field basis. For that
> reason, the latter two are not really problematic. I'm not quite sure
> what the right thing to do is here, but if we wanted to check whether
> a Datum that we're absorbing is non-pglz-compressed in those places,
> it would be easy to do. Also, as far as I can see,
> make_expanded_record_from_datum() is completely unused. So the problem
> case is where expanded_record_set_tuple() is getting called, and
> specifically where it's being called with expand_external = true. Any
> place that it's being called with expand_external = false, there's
> apparently no problem with the result tuple containing external
> datums, so probably non-pglz compressed data is OK there too.
> All of the places that can potentially pass expand_external = true are
> actually passing !estate->atomic, where estate is a PLpgSQL_execstate.
> In other words, I think the case where this happens is when we're in a
> context where the computed value could need to survive across a COMMIT
> or ROLLBACK, like there may be a procedure running (maybe more than
> one, each invoking the next via CALL) but there are no queries in
> progress. We have to expand TOAST references because committing a
> transaction that deleted data means you might not be able to resolve
> the old TOAST pointer any more: even if you use a snapshot that can
> see everything, VACUUM could nuke the deleted rows - or some of them -
> at any time. To avoid trouble we have to un-externalize before any
> COMMIT or ROLLBACK occurs. That can suck for performance because we
> might be fetching a big value that we don't end up using for anything
> - say if the variable isn't used again - but it beats failing.

I agree with most of this, but I don't think the only reason to
un-externalize is just for COMMIT or ROLLBACK. I mean using the
trigger function we might insert a RECORD type to another table which
has the ROWTYPE as the table on which we are doing the operation. See
below example

CREATE TABLE t1(a int, b varchar compression lz4);
INSERT INTO t1 select 1, repeat('a', 3000);
CREATE TABLE t2 (x t1 compression pglz);

CREATE OR REPLACE FUNCTION log_last_name_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
INSERT INTO t2 select OLD;
RETURN NEW;
END;
$$;

CREATE TRIGGER last_name_changes
BEFORE UPDATE
ON t1
FOR EACH ROW
EXECUTE PROCEDURE log_last_name_changes();

UPDATE t1 SET a=2;

SELECT pg_column_compression((t2.x).b) FROM t2;
pg_column_compression
-----------------------
lz4
(1 row)

So basically, in this case, we are not un-externalizing because of
ROLLBACK or COMMIT, instead, we are doing that because we want to
insert it into the new table. So this is without my patch and without
my patch (v25_0001_Disallow_compressed_data_inside_container_types,
basically without the changes in expandedrecord.c). Here is the call
stack when exactly this tuple gets flattened.

#0 expanded_record_set_field_internal (erh=0x2bbcfb0, fnumber=2,
newValue=45863112, isnull=false, expand_external=true,
check_constraints=false) at expandedrecord.c:1225
#1 0x00000000009a1899 in ER_get_flat_size (eohptr=0x2bbcfb0) at
expandedrecord.c:713
#2 0x00000000009a0954 in EOH_get_flat_size (eohptr=0x2bbcfb0) at
expandeddatum.c:77
#3 0x000000000048f61c in heap_compute_data_size (tupleDesc=0x2bd0168,
values=0x2bd02c8, isnull=0x2bd02d0) at heaptuple.c:155
#4 0x00000000004916b3 in heap_form_tuple (tupleDescriptor=0x2bd0168,
values=0x2bd02c8, isnull=0x2bd02d0) at heaptuple.c:1045
#5 0x00000000007296eb in tts_virtual_copy_heap_tuple (slot=0x2bd0280)
at execTuples.c:272
#6 0x0000000000728d0b in ExecCopySlotHeapTuple (slot=0x2bd0280) at
../../../src/include/executor/tuptable.h:456
#7 0x000000000072a5d8 in tts_buffer_heap_copyslot (dstslot=0x2bd0820,
srcslot=0x2bd0280) at execTuples.c:767
#8 0x0000000000758840 in ExecCopySlot (dstslot=0x2bd0820,
srcslot=0x2bd0280) at ../../../src/include/executor/tuptable.h:480
#9 0x000000000075c263 in ExecModifyTable (pstate=0x2bcfa08) at
nodeModifyTable.c:2264

> The argument that we need to force decompression in such cases is
> considerably more tenuous. It revolves around the possibility that the
> compression AM itself has been dropped. As long as we have only
> built-in compression methods, which are undroppable, it seems like we
> could potentially just decide to do nothing at all about this issue.
> If the only reason for expanding TOAST pointers inside the
> expanded-record stuff is to avoid the possibility of one being
> invalidated by a transaction commit, and if compression methods can't
> be invalidated by a transaction commit, well then we don't really have
> a problem.

Even after the above case, we might say it is still not a problem for
this patch because even though t2 doesn't have a direct relationship
with lz4 but it has an indirect relationship with lz4 via t1. So I
think this particular case which I showed might not be a problem even
for the custom compression method. However, I agree that the
decompressing to survive COMMIT/ROLLBACK might be a problem for custom
compression methods but not for the built-in method. So I agree with
the conclusion that even if we don't make any changes to the
"expandedrecord.c", it won't be a problem for the built-in methods.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-02-20 05:45:24 Re: New Table Access Methods for Multi and Single Inserts
Previous Message Amit Kapila 2021-02-20 04:16:21 Re: repeated decoding of prepared transactions