Re: VARIANT / ANYTYPE datatype

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VARIANT / ANYTYPE datatype
Date: 2011-05-05 14:25:29
Message-ID: BANLkTikDx9TAbQMbLsFh_AX+-7k-7KSzkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 4, 2011 at 8:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> As a followup idea there exists the desire to store records as records
>> and not text representation of same (given differing record types, of
>> course), for which it'd be more worthwhile.
>
> Maybe.  The conventional wisdom is that text representation of data is
> more compact than PG's internal representation by a significant factor
> --- our FAQ says up to 5x, in fact.  I know that that's including row
> overhead and indexes and so on, but I still don't find it to be a given
> that you're going to win on space with this sort of trick.

I've done a lot of testing of the text vs binary format on the wire
format...not exactly the same set of issues, but pretty close since
you have to send all the oids, lengths, etc. Conventional wisdom is
correct although overstated for this topic. Even in truly
pathological cases for text, for example in sending multiple levels of
redundant escaping in complex structures, the text format will almost
always be smaller. For 'typical' data it can be significantly
smaller. Two exceptions most people will run into are bytea obviously
and the timestamp family of types where binary style manipulation is a
huge win both in terms of space and performance.

For complex data (say 3+ levels of composites stacked in arrays),
binary type formats are much *faster*, albeit larger, via binary as
long as you are not bandwidth constrained, and presumably they would
be as well for variants. Perhaps even more so, because some of the
manipulations made converting tuple storage to binary wire formats
don't have to happen. That said, while there are use cases for
sending highly structured data over the wire, I can't think of any for
direct storage on a table in variant type scenarios, at least not yet
:-).

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rob Wultsch 2011-05-05 15:36:33 Re: Unlogged vs. In-Memory
Previous Message Jehan-Guillaume (ioguix) de Rorthais 2011-05-05 12:36:07 Re: Unlogged vs. In-Memory