Re: Array, bytea and large objects

From: David Wall <d(dot)wall(at)computer(dot)org>
To: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Array, bytea and large objects
Date: 2009-02-05 18:10:12
Message-ID: 498B2B84.2050407@computer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks, Filip.
>
> If I have an "unlimited" number of name-value pairs that I'd like
> to get easy access to for flexible reports, could I store these in
> two arrays (one for name, the other for value) in a table so that
> if I had 10 name-value pairs or 200 name-value pairs, I could
> store these into a single row using arrays so I could retrieve all
> name-value pairs in a single SELECT from the db?
>
>
> yes you could, but what for?
> what's wrong with many rows? create table kvstore(key text primary
> key,value text);
> what stops you from using single SELECT to get all these kv pairs?

We basically do it that way now, but was thinking we might run tests to
see if it's faster. When we run reports, only 2-3 of the name-value
pairs are used in search criteria, so these we'd like to keep in such a
table, but other fields (typically 10-30 name-value pairs) are just
listed in the report, so we thought it might make sense to keep these in
a single row for efficiency sake as we do retrieve them in a group and
don't need to sort or select based on their values. "Single SELECT" was
poor word choice as we were thinking more about retrieving a single row
with 10-30 name-values stored in an ARRAY would be faster than retrieve
10-30 rows from a joined table.

> At what size does it make more sense to store in LO instead of a
> BYTEA (because of all the escaping and such)?
> How many BYTEA fields can I have in a database?
>
> no limit (other than limits mentioned in the FAQ)
>
>
> Are the BYTEA fields stored in the same table as the rest of the
> data?
>
>
> yes - and the TOAST tables if it's larger than 1/3 of a page or so.
> search for TOAST details if you're interested.
Hmm... So a page is 8192 bytes, and it leaves your regular table and
goes to TOAST if the BYTEA is more than 2730 bytes. I thought it only
went to TOAST when it exceed the page size, not just one-third of its
size. I am sure we have lots of encrypted, compressed XML (so it's all
binary at this point, no longer text) that would be less than that. So
perhaps it makes sense to use BYTEA for these smaller binary objects as
the data is stored with the row, is simpler to deal with, easy to handle
in memory (unlike really big LOs), and the cost of escaping each byte
may not be too high.

I have seen a comparison
(http://zephid.dk/2008/08/09/oid-vs-bytea-in-postgresql/) that show
BYTEA uses more memory (up to 10x more) and is slower (about 4x slower)
than LOs, which indicate that most of this is due to escaping the bytes.

> I'd avoid LO unless you really need streaming (block-wise) access.
This is interesting only because we've done the opposite. That is, we
store all binary data (mostly compressed, encrypted XML name-values) in
LOs today and it works well. But we are concerned about the
pg_largeobject table being a bottleneck, becoming an issue for
vaccum/vacuumlo/pg_dump as our database grows.

We'd like to do streaming for large files being uploaded, but today
we're not doing that and have a java.sql.Blob interface class that
essentially reads/writes using a byte array so we're not getting any
benefits of streaming for very large objects, though as I said, most of
our LOs are really not that big and thus not an issue for us. We'll see
what it means for us to change this to better support streaming for our
truly large objects that we store.

Since you'd avoid LOs, what are the main advantages of BYTEA (since we
use JDBC, we can use both with equal ease as both currently work for us
using byte arrays in our Java code)? I'm still thinking we may find
that based on the size of the binary data, it may be best to choose
BYTEA for smaller (< 8196 or < 2730) data and LOs elsewhere.

Thanks,
David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Filip Rembiałkowski 2009-02-05 18:32:25 Re: Array, bytea and large objects
Previous Message Dimitri Fontaine 2009-02-05 17:08:39 Re: Pet Peeves?