Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-19 21:21:36
Message-ID: CA+TgmoaXs3_uoyOeJhJaS+oY6BvpGmHor70kaS6_03kssM7bVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 19, 2021 at 11:12 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I had an off list discussion with Robert and based on his suggestion
> and a poc patch, I have come up with an updated version for handling
> the composite type. Basically, the problem was that ExecEvalRow we
> are first forming the tuple and then we are calling
> HeapTupleHeaderGetDatum and then we again need to deform to find any
> compressed data so that can cause huge performance penalty in all
> unrelated paths which don't even contain any compressed data. So
> Robert's idea was to check for the compressed/external data even
> before forming the tuple. I have implemented that and I can see we
> are not seeing any performance penalty.

I think that these performance tests aren't really exercising the
expanded-record stuff, just the ExecEvalRow changes. We need to test
that test case, and I tend to suspect there's going to be a measurable
regression.

I spent some time looking at how datums get into the expanded record
system. There seem to be four possible paths:
expanded_record_set_tuple(), make_expanded_record_from_datum(),
expanded_record_set_field_internal(), and
expanded_record_set_fields(). The first two of these inject an entire
tuple, while the latter two work on a field-by-field basis. For that
reason, the latter two are not really problematic. I'm not quite sure
what the right thing to do is here, but if we wanted to check whether
a Datum that we're absorbing is non-pglz-compressed in those places,
it would be easy to do. Also, as far as I can see,
make_expanded_record_from_datum() is completely unused. So the problem
case is where expanded_record_set_tuple() is getting called, and
specifically where it's being called with expand_external = true. Any
place that it's being called with expand_external = false, there's
apparently no problem with the result tuple containing external
datums, so probably non-pglz compressed data is OK there too.

All of the places that can potentially pass expand_external = true are
actually passing !estate->atomic, where estate is a PLpgSQL_execstate.
In other words, I think the case where this happens is when we're in a
context where the computed value could need to survive across a COMMIT
or ROLLBACK, like there may be a procedure running (maybe more than
one, each invoking the next via CALL) but there are no queries in
progress. We have to expand TOAST references because committing a
transaction that deleted data means you might not be able to resolve
the old TOAST pointer any more: even if you use a snapshot that can
see everything, VACUUM could nuke the deleted rows - or some of them -
at any time. To avoid trouble we have to un-externalize before any
COMMIT or ROLLBACK occurs. That can suck for performance because we
might be fetching a big value that we don't end up using for anything
- say if the variable isn't used again - but it beats failing.

The argument that we need to force decompression in such cases is
considerably more tenuous. It revolves around the possibility that the
compression AM itself has been dropped. As long as we have only
built-in compression methods, which are undroppable, it seems like we
could potentially just decide to do nothing at all about this issue.
If the only reason for expanding TOAST pointers inside the
expanded-record stuff is to avoid the possibility of one being
invalidated by a transaction commit, and if compression methods can't
be invalidated by a transaction commit, well then we don't really have
a problem. That's not a great solution in terms of our chances of
getting this whole patch series committed, but it might at least be
enough to unblock the first few patches, and we could document the
rest of the issue for later research.

What makes me a bit uncomfortable about that approach is that it
presupposes that everything that uses expanded records has some other
defense against those tuples getting written to disk without first
expanding any external datums. And it isn't obvious that this is the
case, or at least not to me. For example, PLpgsql's
coerce_function_result_tuple()'s code for
VARATT_IS_EXTERNAL_EXPANDED() has three cases. The first case passes
the tuple through SPI_returntuple() which calls
heap_copy_tuple_as_datum() which calls toast_flatten_tuple_to_datum()
if required, but the second case calls EOH_flatten_into() and does NOT
pass the result through SPI_returntuple(). And ER_flatten_info() has
no defense against this case that I can see: sure, it skips the fast
path if ER_FLAG_HAVE_EXTERNAL is set, but that doesn't actually do
anything to resolve TOAST pointers. Maybe there's no bug there for
some reason, but I don't know what that reason might be. We seem to
have no test cases either in the main test suite or in the plpgsql
test suite where ER_flatten_info gets called with
ER_FLAG_HAVE_EXTERNAL is set, which seems a little unfortunate. If
there is such a bug here it's independent of this patch, I suppose,
but it would still be nice to understand what's going on here better
than I do.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-02-19 21:31:02 Re: Finding cause of test fails on the cfbot site
Previous Message Zhihong Yu 2021-02-19 21:08:42 Re: PATCH: Batch/pipelining support for libpq