Re: vacuum locking

From: Rob Nagler <nagler(at)bivio(dot)biz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: vacuum locking
Date: 2003-10-27 16:19:31
Message-ID: 16285.17811.994000.543635@gargle.gargle.HOWL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.

BTW, I greatly appreciate your support on this stuff. This list is a
fantastic resource.

I think we agree. The question is what is the workload. On tables
without updates, postgres will be fast enough. However, postgres is
slow on tables with updates afaict. I think of OLTP as a system with
updates. One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.

> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.

We'll be trying it as soon as it is out.

> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.

It is at 8000. This is probably as high as I can go with multiple
postmasters. The sort area is shared in Oracle (I think :-) in the
UGA.

> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.

This is a web app. I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.

I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres. Reran
the simpler query (no name) 3 times, and it was still 27 secs.

> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.

I don't have to. The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.

> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:

I put f2 in the group by, and it doesn't matter. That's the point.
It's the on-disk sort before the aggregate that's killing the query.

> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.

I'll be finding out this week.

> To convince it to do the right thing you would have to do either:
>
> SELECT a, t2.name
> FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
> JOIN t2 USING (f2)
>
> Or use a subquery:
>
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> FROM t1
> GROUP BY f2

This doesn't solve the problem. It's the GROUP BY that is doing the
wrong thing. It's grouping, then aggregating.

Rob

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rob Nagler 2003-10-27 16:24:47 Re: vacuum locking
Previous Message Vivek Khera 2003-10-27 16:12:37 Re: Various performance questions