Re: General performance questions about postgres on Apple

From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: simon(at)2ndquadrant(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: General performance questions about postgres on Apple
Date: 2004-02-24 00:54:07
Message-ID: 403AA0AF.209@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Simon Riggs wrote:

>>Sean Shanny
>>Hardware: Apple G5 dual 2.0 with 8GB memory attached via dual fibre
>>channel to a fully loaded 3.5TB XRaid. The XRaid is configured as two
>>
>>
>7
>
>
>>disk hardware based RAID5 sets software striped to form a RAID50 set.
>>The DB, WALS, etc are all on that file set. Running OSX journaled
>>
>>
>file
>
>
>>system Running postgres 7.4.1. OSX Server 10.3.2 Postgres is
>>
>>
>compiled
>
>
>>locally with '--enable-recode' '--enable-multibyte=UNICODE'
>>'CFLAGS=-mcpu=970 -mtune=970 -mpowerpc64 -O3'
>>
>>
>
>Have you tried altering the blocksize to a higher value? Say 32K?
>
>
That is on our to do list. We had made that change while running on BSD
5.1on a Dell 2650 with 4GB and 5 10K SCSI drive in RAID 0. Did not see
a huge improvement.

>
>
>>max_connections = 100
>>
>>
>
>Why have you set this to 100 when you have typically 1-3 users?
>
>
Have already addressed that by lowering this to 50. Will drop it lower
as time goes on.

>
>
>>sort_mem = 256000 # min 64, size in KB
>>
>>
>
>If you have only 1-3 users, then that value seems reasonable.
>
>
>
>>The query is
>>
>>SELECT t1.id, t2.md5, t2.url FROM referral_temp t2 LEFT OUTER JOIN
>>d_referral t1 ON t2.md5 = t1.referral_md5;
>>
>>
>>\d d_referral
>> id | integer | not null
>> referral_md5 | text | not null
>> referral_raw_url | text | not null
>> referral_host | text |
>> referral_path | text |
>> referral_query | text |
>> job_control_number | integer | not null
>>
>>
>>\d referral_temp
>> md5 | text |
>> url | text |
>>
>>
>
>Have you looked at using reversed indexes, as per recent postings in
>[performance]? These seemed to help considerably with lookup speed when
>using a large URL database, which seems to be your situation here.
>
>
We create an MD5 of the URL and store it as referral_md5. This is our
key for lookup. We ran into problems with the URL as the index. The
postgres indexing code was complaining about the URL being too long,
hence the MD5 which thought longer to compute during the ETL phase is
much quicker to match on.

>...
>
>
>>Jeff Boes writes
>>We have a large (several million row) table with a field containing
>>URLs. Now, funny thing about URLs: they mostly start with a common
>>substring ("http://www."). But not all the rows start with this, so we
>>can't just lop off the first N characters. However, we noticed some
>>
>>
>time
>
>
>>ago that an index on this field wasn't as effective as an index on the
>>REVERSE of the field. So ...
>>
>>CREATE OR REPLACE FUNCTION fn_urlrev(text) returns text as '
>>return reverse(lc($_[0]))
>>' language 'plperl' with (iscachable,isstrict);
>>
>>and then
>>
>>CREATE UNIQUE INDEX ix_links_3 ON links
>>(fn_urlrev(path_base));
>>
>>
>
>You have 2 CPUs: have you tried splitting your input data file into two
>tables, then executing the same query simultaneously, to split the
>processing? If you get the correct plan, you should use roughly the same
>I/O but use all of the available CPU power.
>
>
Have not considered that.

>I'm sure we'd all be interested in your further results!
>
>
I will post things as I discover them.

--sean

>Best Regards, Simon Riggs
>
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Bertheau 2004-02-24 01:23:27 Re: Slow join using network address function
Previous Message Simon Riggs 2004-02-24 00:17:28 Re: General performance questions about postgres on Apple hardware...