Skip site navigation (1) Skip section navigation (2)

Re: Speeding up query, Joining 55mil and 43mil records.

From: Sven Geisler <sgeisler(at)aeccom(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: nicky <nicky(at)valuecare(dot)nl>
Subject: Re: Speeding up query, Joining 55mil and 43mil records.
Date: 2006-06-21 15:37:07
Message-ID: 449967A3.3030801@aeccom.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi Nicky,

I guess, you should try to upgrade the memory setting of PostgreSQL first.

work_mem = 65536

Is a bit low for such large joins.

Did you get a change to watch the directory 
<PGDATA>/base/<DBOID>/pgsql_tmp to see how large the temporary file is 
during this query. I'm sure that there is large file.

Anyhow, you can upgrade 'work_mem' to 1000000 which is 1 GB. Please note 
that the parameter work_mem is per backend process. You will get 
problems with multiple large queries at the same time.
You may move (link) the directory 'pgsql_tmp' to a very fast file system 
if you still get large files in this directory.

You also can try to increase this settings:

checkpoint_segments = 256
checkpoint_timeout = 3600  # range 30-3600, in seconds
checkpoint_warning = 0     # 0 is off

Please read the PostgreSQL documentation about the drawbacks of this 
setting as well as your setting 'fsync=off'.

Cheers
Sven.

nicky schrieb:
> Hello People,
> 
> I'm trying to solve a 'what i feel is a' performance/configuration/query 
> error on my side. I'm fairly new to configuring PostgreSQL so, i might 
> be completely wrong with my configuration.
> 
> My database consists of 44 tables, about 20GB. Two of those tables are 
> 'big/huge'. Table src.src_faktuur_verricht contains 43million records 
> (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
> 
> Below is the 'slow' query.
> 
> INSERT INTO rpt.rpt_verrichting
> (verrichting_id
> ,verrichting_secid
> ,fout_status
> ,patientnr
> ,verrichtingsdatum
> ,locatie_code
> ,afdeling_code
> ,uitvoerder_code
> ,aanvrager_code
> ,verrichting_code
> ,dbcnr
> ,aantal_uitgevoerd
> ,kostenplaats_code
> ,vc_patientnr
> ,vc_verrichting_code
> ,vc_dbcnr
> )
> SELECT  t1.id
> ,       t0.secid
> ,       t1.status
> ,       t1.patientnr
> ,       t1.datum
> ,       t1.locatie
> ,       t1.afdeling
> ,       t1.uitvoerder
> ,       t1.aanvrager
> ,       t0.code
> ,       t1.casenr
> ,       t0.aantal
> ,       t0.kostplaats
> ,       null
> ,       null
> ,       null
> FROM    src.src_faktuur_verrsec t0 JOIN
>         src.src_faktuur_verricht t1 ON
>         t0.id = t1.id
> WHERE   substr(t0.code,1,2) not in ('14','15','16','17')
> AND     (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
> AND     EXTRACT(YEAR from t1.datum) > 2004;
> 
> 
> Output from explain
> 
> Hash Join  (cost=1328360.12..6167462.76 rows=7197568 width=118)
>   Hash Cond: (("outer".id)::text = ("inner".id)::text)
> 
>   ->  Seq Scan on src_faktuur_verrsec t0  (cost=0.00..2773789.90 
> rows=40902852 width=52)
>         Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND 
> (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 
> 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND 
> ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
>   ->  Hash  (cost=1188102.97..1188102.97 rows=8942863 width=80)
>         ->  Bitmap Heap Scan on src_faktuur_verricht t1  
> (cost=62392.02..1188102.97 rows=8942863 width=80)
>               Recheck Cond: (date_part('year'::text, datum) > 
> 2004::double precision)
>               ->  Bitmap Index Scan on src_faktuur_verricht_idx1  
> (cost=0.00..62392.02 rows=8942863 width=0)
>                     Index Cond: (date_part('year'::text, datum) > 
> 2004::double precision)
> 
> 
> The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
> It contains two SATA150 disks, one contains PostgreSQL and the rest of 
> the operating system and the other disk holds the pg_xlog directory.
> 
> Changed lines from my postgresql.conf file
> 
> shared_buffers = 8192
> temp_buffers = 4096
> work_mem = 65536
> maintenance_work_mem = 1048576
> max_fsm_pages = 40000
> fsync = off
> wal_buffers = 64
> effective_cache_size = 174848
> 
> The query above takes around 42 minutes.
> 
> However, i also have a wimpy desktop machine with 1gb ram. Windows with 
> MSSQL 2000 (default installation), same database structure, same 
> indexes, same query, etc and it takes 17 minutes. The big difference 
> makes me think that i've made an error with my PostgreSQL configuration. 
> I just can't seem to figure it out.
> 
> Could someone perhaps give me some pointers, advice?
> 
> Thanks in advance.
> 
> Nicky
> 

In response to

pgsql-performance by date

Next:From: Ron St-PierreDate: 2006-06-21 15:37:51
Subject: Tuning New Server (slow function)
Previous:From: jody brownellDate: 2006-06-21 15:27:06
Subject: Re: Help tuning autovacuum - seeing lots of relationbloat

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group