Re: cast bytea to double precision[]

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Mathieu Dubois <mathieu(dot)dubois(at)limsi(dot)fr>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: cast bytea to double precision[]
Date: 2011-07-26 18:40:43
Message-ID: CAHyXU0w0S+CfE26hpy1hPjVSAVydi2uuDqMMAuzt-qydKG-H4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Tue, Jul 26, 2011 at 12:25 PM, Mathieu Dubois
<mathieu(dot)dubois(at)limsi(dot)fr> wrote:
> On 07/26/2011 04:30 PM, Merlin Moncure wrote:
>>
>> On Tue, Jul 26, 2011 at 2:45 AM, Mathieu Dubois<mathieu(dot)dubois(at)limsi(dot)fr>
>>  wrote:
>>>
>>> Hello,
>>>
>>> Le 25/07/2011 17:58, Mathieu Dubois a écrit :
>>>>
>>>> On 07/25/2011 05:54 PM, Merlin Moncure wrote:
>>>>>
>>>>> On Sun, Jul 24, 2011 at 2:03 PM, Mathieu
>>>>> Dubois<mathieu(dot)dubois(at)limsi(dot)fr>  wrote:
>>>>>>
>>>>>> I have found a solution by myself for the conversion:
>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[] (elements
>>>>>> are in
>>>>>> fact separated by commas).
>>>>>>
>>>>>> To convert my column I have used:
>>>>>> ALTER TABLE my_table ALTER sig_vector TO double precision[] USING
>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[];
>>>>>>
>>>>>> Is that correct?
>>>>>> Is it correct to pass the column name to regexp_split_to_array()?
>>>>>
>>>>> Yeah -- you are just passing a column's data into a function as an
>>>>> argument -- standard practice. This will work -- your bytea is really
>>>>> a text column, so it's just a matter of breaking up the string.
>>>>> regexp_* functions are great for that.
>>>>
>>>> Thank you very much for your reply.
>>>>
>>>> I will launch the conversion right now.
>>>>
>>> The main reason to do this was to have smaller backups.
>>> The size of a compressed backup was around 1GB with bytea.
>>> I have converted the columns (on a copy of the database) but the expected
>>> gain are not here!
>>> With double precision[] it is still around 1GB (a little bit smaller but
>>> just a few MB).
>>>
>>> Also the size on the disk is not smaller.
>>> I have listed the content of /var/lib/postgres/8.4/main/base with du and
>>> the
>>> 2 versions have the same size (3.1GB).
>>>
>>> Does it make sense?
>>> My hypothesis is that the compression algorithm is able to find
>>> regularities
>>> the data so it finds the same regularity in bytea and in double
>>> precision[].
>>>
>>> Is there any advantage to use double precision[] over bytea in my case?
>>
>> probably not -- arrays can be significantly smaller than a set of
>> individual tuples each holding one value because of the tuple
>> overhead, but you still have to pay for the array header and a 4 byte
>> length/null indicator per element.
>>
>> A packed string is often the smallest way to store data, although not
>> necessarily the best.  A double precision[] comes with a lot of syntax
>> advantages.
>
> Thanks for your advice!
>
> I find the result surprising because the floats are encoded with a lot of
> characters (something like 20) while a double is 8 bytes.
>
> I have tried to run VACCUM but it changed nothing...
>
> All of my code is based on strings so I won't take time to modify it if
> there is no gain.

right. remember that with each array element there is a 4 byte
indicator (which is the length and the null flag) for each element.
So array element size is 12 bytes, not 8, and you have the overhead of
the array itself which contains dimension information and type oid.
So, it's about a wash vs 15 digits precision floating point and one
byte delimiter.

If all the elements are known not null and the internally stored type
is fixed length, it's hypothetically possible to not have to store the
length/null indicator, but postgres does not do that and it might not
be worth it even if you wanted to.

merlin

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Mathieu Dubois 2011-07-26 19:02:12 Re: cast bytea to double precision[]
Previous Message Mathieu Dubois 2011-07-26 17:25:36 Re: cast bytea to double precision[]