Re: need help with a query

From: Pavel Velikhov <pvelikhov(at)yahoo(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: need help with a query
Date: 2007-10-23 09:54:04
Message-ID: 226961.97083.qm@web56401.mail.re3.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 10/20/07, Pavel Velikhov <pvelikhov(at)yahoo(dot)com> wrote:
> Left the query running for 10+ hours and had to kill it. I guess
there
> really was no need to have lots of
> shared buffers (the hope was that postgresql will cache the whole
table). I
> ended up doing this step inside
> the application as a pre-processing step. Can't have postgres running
with
> different fsych options since this
> will be part of an "easy to install and run" app, that should just
require a
> typical PosgreSQL installation.

>Is the size always different? If not, you could limit the updates:

>UPDATE links
> SET target_size = size
>FROM articles
>WHERE articles.article_id = links.article_to
> AND links.target_size != articles.size;

Ah, this sounds better for sure! But its probably as good as the scan with an index-scan subquery I was getting before...

>Since this is a huge operation, what about trying:

>CREATE TABLE links_new AS SELECT l.col1, l.col2, a.size as
>target_size, l.col4, ... FROM links l, articles a WHERE a.article_id =
>l.article_to;

>Then truncate links, copy the data from links_new. Alternatively, you
>could drop links, rename links_new to links, and recreate the
>constraints.

>I guess the real question is application design. Why doesn't this app
>store size at runtime instead of having to batch this huge update?

This is a link analysis application, I need to materialize all the sizes for target
articles in order to have the runtime part (vs. the loading part) run efficiently. I.e.
I really want to avoid a join with the articles table at runtime.

I have solved the size problem by other means (I compute it in my loader), but
I still have one query that needs to update a pretty large percentage of the links table...
I have previously used mysql, and for some reason I didn't have a problem with queries
like this (on the other hand mysql was crashing when building an index on article_to in the
links relation, so I had to work without a critical index)...

Thank!
Pavel

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah(dot)harris(at)enterprisedb(dot)com
Edison, NJ 08837 | http://www.enterprisedb.com/

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Browse pgsql-performance by date

  From Date Subject
Next Message Nis Jørgensen 2007-10-23 11:16:46 Re: How to improve speed of 3 table join &group (HUGE tables)
Previous Message Jeff Davis 2007-10-23 02:45:11 Re: Seqscan