Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-26 14:40:29
Message-ID: CAFiTN-u3jrRDiuyGxGvNSGDXLhG4=U2orHnjk4B6WBy9Eo9kMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 21, 2021 at 5:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>

Based on offlist discussion with Robert, I have done further analysis
of the composite type data. So the Idea is that I have analyzed all
the callers of
HeapTupleGetDatum and HeapTupleHeaderGetDatum and divide them into two
category 1) Callers which are forming the tuple from values that can
not have compressed/external data.
2) Callers which can have external/compressed data

So for type 1) instead of calling HeapTupleGetDatum or
HeapTupleHeaderGetDatum we can call some function HeapTupleGetRawDatum
which is similar to PoniterGetDatum. And for type 2) we will detoast
any varlena even before forming the tuple so that we don't have to pay
the penalty for checking the compressed attributes after forming the
tuple. After this change now we have no caller for HeapTupleGetDatum
but I have kept it because it is an exposed routine.

Here is the analysis for the callers for HeapTupleGetDatum and
HeapTupleHeaderGetDatum
1. functions which can build tuple from compressed/external filed
(Detoasted before forming the tuple)
ExecEvalRow(), ExecEvalConvertRowtype(), ExecEvalConvertRowtype(),
populate_record()
exec_eval_datum()->make_tuple_from_row() before forming tuple
populate_record()

2. functions (no compressed/external filed possible analysis given
function wise):
dblink_get_pkey() : INT and Name tuple built from fixed length data types
hstore_populate_record(),hstore_each() : Getting values from hstore no
ondisk varlena possible
pg_old_snapshot_time_mapping(): Building tuple from in memory old snapshot data
pg_buffercache_pages(): Buffer cache info
pg_stat_statements_info(): No disk value, just stat_statement info
pgp_armor_headers(): Building in memory string in
pgp_extract_armor_headers and operating on those values
pgstattuple_approx_internal(): no varlena
ssl_extension_info(): ssl extension info, no any ondisk data
pg_last_committed_xact(): fixed length field
pg_xact_commit_timestamp_origin(): No varlena field
pg_get_multixact_members(): multixact info no actual tuple data
pg_prepared_xact(): prepared xact info
pg_walfile_name_offset(): walfile name/offset
pg_get_object_address(), pg_identify_object(),
pg_identify_object_as_address(): Only object information, form from in
memory strings.
pg_sequence_parameters(): No varlena field
pg_stat_get_wal_receiver(): walreceiver statistics
pg_stats_ext_mcvlist_items(): In memory array or fixed length fields.
tt_process_call(),prs_process_call: parser tokens
aclexplode(): fixed type and cstring from in memory string.
pg_timezone_abbrevs(): no varlena
pg_stat_file(): no varlena
pg_lock_status(): lock stats
pg_get_keywords(), pg_get_catalog_foreign_keys(): in memory strings.
pg_partition_tree(): no varlena
pg_stat_get_wal(): no varlena
tsvector_unnest(): in memory array
show_all_settings(): guc values from in memory struct
plperl_hash_to_datum(): value fetched from perl hash
pltcl_func_handler(): tuple from cstrings
test_predtest(): no varlena
pg_visibility*(): Only data from visibility map so no varlena
pg_stat_get_wal(), pg_stat_get_archiver(): Building tuple from in-memory data
pgstatindex_impl(), pgstatginindex_internal(), pgstathashindex():
fixed/in-memory data
replication slot func in slotfunc.c: replication slot info
controls file info functions in pg_conttroldata.c : Control file data
page inspect function in contrib/pageinspect(brinfunc.c, btreefunc.c,
ginfunc.c, gistfunc.c, hashfunc.c, heapfunc.c): Only meta page or
header info.
record_in and record_recv, are forming tuple from input cstring

Next I will be working on reviewing GUC for default compression method
by Justin and post the next patch series.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v27-0001-Disallow-compressed-data-inside-container-types.patch text/x-patch 42.6 KB
v27-0003-default-to-with-lz4.patch text/x-patch 1.7 KB
v27-0002-Built-in-compression-method.patch text/x-patch 111.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-02-26 14:53:58 RE: [HACKERS] logical decoding of two-phase transactions
Previous Message vignesh C 2021-02-26 13:56:02 Re: repeated decoding of prepared transactions