Re: Query performance

From: "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
To: "Hakan Kocaman" <Hakan(dot)Kocaman(at)digame(dot)de>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query performance
Date: 2006-08-03 10:27:36
Message-ID: 44D1EBB8.0AD0.0080.0@klinik.uni-regensburg.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

here is the definition of the master table which is inherited by around 30 tables based on the value of chr:
CREATE TABLE snp_master
(
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq'::regclass),
pid varchar(15) NOT NULL,
snp_id varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
chr int2 NOT NULL,
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
CONSTRAINT "UNIQUE_SNP_ALLEL_MASTER" UNIQUE (pid, entry_no, snp_id)
)
WITHOUT OIDS;

The thing is that i load the data from txt files which themselves have more or less the same structure. So for every pid of 1500 there are up to 42000 different snp_id values and for each of this combinations there are different val1 and val2 entries (all together this accounts for up to around 58 million tuples in the biggest table). MyCritTable then just contains the distinct snp_ids so that at least this query does not take very long any more.
CREATE TABLE snps_master
(
snp_id varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_snp_master PRIMARY KEY (snp_id)
)
WITHOUT OIDS;

Up to now there are no foreign key constraints, as once the data is loaded into the db there will be no update or delete operations at all. I only have to export the data to different file formats for which i need the query posted originally.

"Hakan Kocaman" <Hakan(dot)Kocaman(at)digame(dot)de> wrote on 08/03/06 11:36 am:
> Hi,
>
> maybe you should overthink your db-design, but thats a bit premature
> whithout your complete
> table-definitions(including table-names, datatypes, indexes,
> foreign-key constraints,etc.)
>
> If your are using pgadmin3 just cut'n paste the content of the window
> on the bottom left for
> the corresponding tables.
>
> If you're using psql try \d yur-table-name.
>
> Best regards
>
> Hakan Kocaman
> Software-Development
>
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
>
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: hakan(dot)kocaman(at)digame(dot)de
>
>
>
>> -----Original Message-----
>> From: Christian Rengstl
>> [mailto:Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de]
>> Sent: Thursday, August 03, 2006 11:18 AM
>> To: Richard Huxton; Hakan Kocaman
>> Cc: pgsql-general(at)postgresql(dot)org
>> Subject: Re: [GENERAL] Query performance
>>
>>
>> Hi,
>>
>> i would rather compare int4 too, but the snp_id can be
>> something like "abc123" unfortunately.
>>
>> "Hakan Kocaman" <Hakan(dot)Kocaman(at)digame(dot)de> wrote on 08/03/06 11:08 am:
>> > Hi,
>> >
>> >
>> >> -----Original Message-----
>> >> From: Richard Huxton [mailto:dev(at)archonet(dot)com]
>> >> Sent: Thursday, August 03, 2006 11:00 AM
>> >> To: Christian Rengstl
>> >> Cc: Hakan Kocaman; pgsql-general(at)postgresql(dot)org
>> >> Subject: Re: [GENERAL] Query performance
>> >>
>> >>
>> >> Christian Rengstl wrote:
>> >> > Hi,
>> >> >
>> >> > the complete query is the one i posted, but here comes the
>> >> schema for mytable:
>> >> > entry_no int8 NOT NULL DEFAULT
>> nextval('entry_no_seq''::regclass),
>> >> > pid varchar(15) NOT NULL,
>> >> > crit varchar(13) NOT NULL,
>> >> > val1 varchar(1),
>> >> > val2 varchar(1),
>> >> > aendat text,
>> >> > aennam varchar(8),
>> >> > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>> >> >
>> >> > myCritTable:
>> >> > crit varchar(13) NOT NULL,
>> >> > chr int2,
>> >> > aendat timestamp,
>> >> > CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>> >>
>> >> Still doesn't match the EXPLAIN output - where's snp_id?
>> >> Where's table
>> >> test2?
>> >>
>> >
>> > Yep, that bothered me too.
>> >
>> >> > My server is 8.1.4. As a matter of fact, i have no idea
>> >> where the text
>> >> > type comes from, because as you can see from above
>> there are only
>> >> > varchar with maximum 15 characters.
>> >>
>> >> PG is casting it to text. There's no real difference between
>> >> the types
>> >> (other than the size limit) and it's not expensive.
>> >
>> > But wouldn't a comparison between int4 be much cheaper.
>> > If i see smth like "id" (here snp_id) in a fieldname it should be a
>> > int-type, i think.
>> >
>> >>
>> >> > "Hakan Kocaman" <Hakan(dot)Kocaman(at)digame(dot)de> wrote on 08/03/06
>> >> 10:34 am:
>> >> >> Hi,
>> >> >>
>> >> >> can you post the complete query,schema- and
>> >> >> table-definition,server-version etc. ?
>> >> >> This will help to identity the main problem.
>> >> >>
>> >> >> So at the moment i'm just guessing:
>> >> >>
>> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>> >> >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66
>> >> rows=37120
>> >> >> width=23)
>> >> >> (actual time=291.600..356707.737 rows=37539 loops=1)
>> >> >> This part is very expensive, but i got no clue why.
>> >>
>> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of
>> >> this. You
>> >> might want to increase work_mem, it could be that the bitmap
>> >> is spilling
>> >> to disk (which is much slower than keeping it all in RAM)
>> >>
>> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
>> > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
>> >
>> > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see
>> what happens.
>> >
>> > --
>> > Richard Huxton
>> > Archonet Ltd
>> >
>> >
>> >
>> > Hakan Kocaman
>> > Software-Development
>> >
>> > digame.de GmbH
>> > Richard-Byrd-Str. 4-8
>> > 50829 Köln
>> >
>> > Tel.: +49 (0) 221 59 68 88 31
>> > Fax: +49 (0) 221 59 68 88 98
>> > Email: hakan(dot)kocaman(at)digame(dot)de
>> >
>> > ---------------------------(end of
>> broadcast)---------------------------
>> > TIP 9: In versions below 8.0, the planner will ignore your desire to
>> > choose an index scan if your joining column's
>> datatypes do not
>> > match
>>
>>
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>>
>>

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2006-08-03 10:54:05 Re: TSearch: Need debug help
Previous Message Jonathan Vallar 2006-08-03 09:59:03 Dumping database using 8.1 or 7.1