Re: Clarification regarding array columns usage?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "m(dot) hvostinski" <makhvost(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Clarification regarding array columns usage?
Date: 2009-12-26 06:55:51
Message-ID: 162867790912252255h2cf0a038g4b87610df59086ee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

2009/12/25 m. hvostinski <makhvost(at)gmail(dot)com>:
> Hi,
>
> I would appreciate if someone could clarify the aspects of using array
> columns.
>

a) don't use array column as storage for list of foreign keys. It is
one basic and significant rule. Planner and optimizer cannot work well
with keys stored in arrays. So your queries should be suboptimal.

b) arrays are good for storing time series and similar values.
c) it is good for storing some polymorphic logs

d) it is usable for storing EAV with some risks.

> We need to store up to ten related integer values per row and currently it
> is implemented as a varchar column that holds a string that is concatenated
> by a trigger function. Something like this:
>
> FOR var IN (SELECT id FROM support_table WHERE...) LOOP
>        str := concatenate string...
> END LOOP;
> UPDATE main_table SET id_string = str WHERE...
>
> So we have a string like this "1201,1202,1203,201" in the main_table varchar
> column that is parsed by the client app. Recently I realized that it would
> be more natural to use the array column in the main table to store the
> values - no looping, concatenation, parsing, should take less space. After
> implementing it I run explain on selects from the main table to compare
> string vs. array and results are somewhat confusing.

sure - array of integer is better than list of integer stored in varchar

>
> EXPLAIN SELECT id_string FROM main_table WHERE...
> returns row width: 3 where actual value of the id_string =
> "1201,1202,1203,201"
>
> EXPLAIN SELECT id_array FROM main_table WHERE...
> returns row width: 26 for the same values
>
> It looks like array takes more space than a string containing the same
> values. Another strange thing is that for the varchar column explain shows
> width 3, it's to low. I thought that it might be related to TOAST but I
> understand that TOAST kicks in only if the row size is more than 2kb and
>

width in this case isn't important. Length of integer array is: header
+ sizeof(int)*number of elements,
length of varchar is: header + number of chars. If you store small
values, like 1,20,19 then list of values should be smaller then array.
But processing of array is faster.

> EXPLAIN SELECT * FROM main_table WHERE...
> returns row width: 251
>
>
> Hence the questions:
>
> -- Could someone help me to interpret the explain readings above?
>
> -- Is storing integers in array is really more space efficient approach?

the primary goal for storing values in arrays is protection under slow
parsing. Integer takes 4bytes, storing number in string is dynamic -
it depends on count of numbers + separators. But it has slower
processing then arrays - it good for reports only. For searching in
string list you have to use like %val% what is probably the most slow
operation - and then slow is your application too.

>
> -- Is there a noticeable performance difference in reading array vs varchar
> columns? Creating id string is relatively rare operation in our case and if
> reading strings is faster may be it makes sense to have the overhead of
> string concatenation in the trigger.

no - varchar and array is stored as stream of bytes, so reading
performance is +/- same.

>
> -- Is it possible to estimate how much slower the string concatenation
> trigger function would be in comparison to one that insets into array column
> on up to 10 values per string/array? The trigger still will be executed
> fairly often.
>

probably there are not significant difference for write. With arrays
you can write more sophisticated and faster queries.

Regards
Pavel Stehule

>
> Thanks.
>
>
>
>
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message S Arvind 2009-12-26 11:27:58 Re: Session based transaction!!
Previous Message Randal L. Schwartz 2009-12-26 03:28:49 Re: Clarification regarding array columns usage?