Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-19 16:12:29
Message-ID: CAFiTN-u2pyXDDDwZXJ-fVUwbLhJSe9TbrVR6rfW_rhdyL1A5bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 19, 2021 at 2:43 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

I had an off list discussion with Robert and based on his suggestion
and a poc patch, I have come up with an updated version for handling
the composite type. Basically, the problem was that ExecEvalRow we
are first forming the tuple and then we are calling
HeapTupleHeaderGetDatum and then we again need to deform to find any
compressed data so that can cause huge performance penalty in all
unrelated paths which don't even contain any compressed data. So
Robert's idea was to check for the compressed/external data even
before forming the tuple. I have implemented that and I can see we
are not seeing any performance penalty.

Test setup:
----------------
create table t1 (f1 int, f2 text, f3 text, f4 text, f5 text, f6
text,f7 text, f8 text, f9 text);
create table t2 (f1 int, f2 text, f3 text, f4 text, f5 text, f6
text,f7 text, f8 text, f9 text);
create table t3(x t1);

pgbench custom script for all test:
------------------------------------------------
\set x random(1, 10000)
select row(f1,f2,f3,f4,f5,f6,f7,f8,f9)::t1 from t2 where f1=:x;

test1:
Objective: Just select on data and form row, data contain no
compressed/external (should not create regression on unrelated paths)
data: insert into t2 select i, repeat('f1',
10),repeat('f2',10),repeat('f3', 10),repeat('f4', 10),repeat('f5',
10),repeat('f6',10),repeat('f7', 10),repeat('f8', 10) from
generate_series(1,10000) as i;
Result(TPS): Head: 1509.79 Patch: 1509.67

test2: data contains 1 compressed filed no external data
data: insert into t2 select i, repeat('f2',
10),repeat('f3',10000),repeat('f3', 10),repeat('f5', 10),repeat('f6',
4000),repeat('f7',10),repeat('f8', 10),repeat('f9', 10) from
generate_series(1,10000) as i;
Result(TPS): Head: 1088.08 Patch: 1071.48

test4: data contains 1 compressed/1 external field
(alter table t2 alter COLUMN f2 set storage external;)
data: (insert into t2 select i, repeat('f2',
10000),repeat('f3',10000),repeat('f3', 10),repeat('f5',
10),repeat('f6', 4000),repeat('f7',10),repeat('f8', 10),repeat('f9',
10) from generate_series(1,10000) as i;)
Result(TPS): Head: 1459.28 Patch: 1459.37

test5: where head need not decompress but patch needs to:
data: insert into t2 select i, repeat('f2',
10),repeat('f3',6000),repeat('f34', 5000),repeat('f5',
10),repeat('f6', 4000),repeat('f7',10),repeat('f8', 10),repeat('f9',
10) from generate_series(1,10000) as I;
--pgbench script
\set x random(1, 10000)
insert into t3 select row(f1,f2,f3,f4,f5,f6,f7,f8,f9)::t1 from t2 where f1=:x;
Result(TPS): Head: 562.36 Patch: 469.91

Summary: It seems like in most of the unrelated cases we are not
creating any regression with the attached patch. There is only some
performance loss when there is only the compressed data in such cases
with the patch we have to decompress whereas in head we don't. But, I
think it is not a overall loss because eventually if we have to fetch
the data multiple time then with patch we just have to decompress once
as whole row is compressed whereas on head we have to decompress field
by field, so I don't think this can be considered as a regression.

I also had to put the handling in the extended record so that it can
decompress any compressed data in the extended record. I think I need
to put some more effort into cleaning up this code. I have put a very
localized fix in ER_get_flat_size, basically this will ignore the
ER_FLAG_HAVE_EXTERNAL flag and it will always process the record. I
think the handling might not be perfect but I posted it to get the
feedback on the idea.

Other changes:
- I have fixed other pending comments from Robert. I will reply to
individual comments in a separate mail.
- Merge HIDE_COMPRESSAM with 0001.

Pending work:
- Cleanup 0001, especially for extended records.
- Rebased other patches.
- Review default compression method guc from Justin

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v25-0001-Disallow-compressed-data-inside-container-types.patch text/x-patch 6.9 KB
v25-0003-default-to-with-lz4.patch text/x-patch 1.7 KB
v25-0002-Built-in-compression-method.patch text/x-patch 111.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-02-19 16:18:47 Re: [HACKERS] Custom compression methods
Previous Message gkokolatos 2021-02-19 15:57:22 Re: PATCH: Attempt to make dbsize a bit more consistent