Skip site navigation (1) Skip section navigation (2)

Re: Join runs for > 10 hours and then fills up >1.3TB of disk space

From: kevin kempter <kevin(at)kevinkempterllc(dot)com>
To: kevin kempter <kevin(at)kevinkempterllc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Date: 2008-05-16 07:08:37
Message-ID: 18F4251F-5B1F-4A0D-A2EE-CA178F82F776@kevinkempterllc.com (view raw or flat)
Thread:
Lists: pgsql-performance
Also, I'm running version 8.3 on a centOS box with 2  dual core CPU's  
and 32Gig of ram


On May 16, 2008, at 12:58 AM, kevin kempter wrote:

> Sorry I goofed on the query text Here's the correct query:
>
> select
> f14.xpublisher_dim_id,
> f14.xtime_dim_id,
> f14.xlocation_dim_id,
> f14.xreferrer_dim_id,
> f14.xsite_dim_id,
> f14.xsystem_cfg_dim_id,
> f14.xaffiliate_dim_id,
> f14.customer_id,
> f14.pf_dts_id,
> f14.episode_id,
> f14.sessionid,
> f14.bytes_received,
> f14.bytes_transmitted,
> f14.total_played_time_sec,
> segdim.xsegment_dim_id as episode_level_segid
> from
> bigtab_stats_fact_tmp14 f14,
> xsegment_dim segdim
> where
> f14.customer_id = segdim.customer_srcid
> and f14.show_id = segdim.show_srcid
> and f14.season_id = segdim.season_srcid
> and f14.episode_id = segdim.episode_srcid
> and segdim.segment_srcid is NULL;
>
>
>
>
>
>
> On May 16, 2008, at 12:31 AM, kevin kempter wrote:
>
>> Hi List;
>>
>> I have a table with 9,961,914 rows in it (see the describe of  
>> bigtab_stats_fact_tmp14 below)
>>
>> I also have a table with 7,785 rows in it (see the describe of  
>> xsegment_dim below)
>>
>> I'm running the join shown below and it takes > 10 hours and  
>> eventually runs out of disk space on a 1.4TB file system
>>
>> I've included below a describe of both tables, the join and an  
>> explain plan, any help / suggestions would be much appreciated !
>>
>> I need to get this beast to run as quickly as possible (without  
>> filling up my file system)
>>
>>
>> Thanks in advance...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> select
>> f14.xpublisher_dim_id,
>> f14.xtime_dim_id,
>> f14.xlocation_dim_id,
>> f14.xreferrer_dim_id,
>> f14.xsite_dim_id,
>> f14.xsystem_cfg_dim_id,
>> f14.xaffiliate_dim_id,
>> f14.customer_id,
>> pf_dts_id,
>> episode_id,
>> sessionid,
>> bytes_received,
>> bytes_transmitted,
>> total_played_time_sec,
>> segdim.xsegment_dim_id as episode_level_segid
>> from
>> bigtab_stats_fact_tmp14 f14,
>> xsegment_dim segdim
>> where
>> f14.customer_id = segdim.customer_srcid
>> and f14.show_id = segdim.show_srcid
>> and f14.season_id = segdim.season_srcid
>> and f14.episode_id = segdim.episode_srcid
>> and segdim.segment_srcid is NULL;
>>
>>
>>
>>
>>
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
>> Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND  
>> (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =  
>> f14.show_id) AND (segdim.season_srcid = f14.season_id))
>> ->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
>> Sort Key: segdim.episode_srcid, segdim.customer_srcid,  
>> segdim.show_srcid, segdim.season_srcid
>> ->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643  
>> width=40)
>> Filter: (segment_srcid IS NULL)
>> ->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
>> Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
>> ->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74  
>> rows=9961874 width=126)
>> (9 rows)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> # \d bigtab_stats_fact_tmp14
>> Table "public.bigtab_stats_fact_tmp14"
>> Column          |            Type             | Modifiers
>> --------------------------+-----------------------------+-----------
>> pf_dts_id     | bigint                      |
>> pf_device_id       | bigint                      |
>> segment_id               | bigint                      |
>> cdn_id                   | bigint                      |
>> collector_id             | bigint                      |
>> digital_envoy_id         | bigint                      |
>> maxmind_id               | bigint                      |
>> quova_id                 | bigint                      |
>> website_id               | bigint                      |
>> referrer_id              | bigint                      |
>> affiliate_id             | bigint                      |
>> custom_info_id           | bigint                      |
>> start_dt                 | timestamp without time zone |
>> total_played_time_sec    | numeric(18,5)               |
>> bytes_received           | bigint                      |
>> bytes_transmitted        | bigint                      |
>> stall_count              | integer                     |
>> stall_duration_sec       | numeric(18,5)               |
>> hiccup_count             | integer                     |
>> hiccup_duration_sec      | numeric(18,5)               |
>> watched_duration_sec     | numeric(18,5)               |
>> rewatched_duration_sec   | numeric(18,5)               |
>> requested_start_position | numeric(18,5)               |
>> requested_stop_position  | numeric(18,5)               |
>> post_position            | numeric(18,5)               |
>> is_vod                   | numeric(1,0)                |
>> sessionid                | bigint                      |
>> create_dt                | timestamp without time zone |
>> segment_type_id          | bigint                      |
>> customer_id              | bigint                      |
>> content_publisher_id     | bigint                      |
>> content_owner_id         | bigint                      |
>> episode_id               | bigint                      |
>> duration_sec             | numeric(18,5)               |
>> device_id                | bigint                      |
>> os_id                    | bigint                      |
>> browser_id               | bigint                      |
>> cpu_id                   | bigint                      |
>> xsystem_cfg_dim_id  | bigint                      |
>> xreferrer_dim_id    | bigint                      |
>> xaffiliate_dim_id   | bigint                      |
>> xsite_dim_id        | bigint                      |
>> xpublisher_dim_id   | bigint                      |
>> season_id                | bigint                      |
>> show_id                  | bigint                      |
>> xsegment_dim_id     | bigint                      |
>> location_id              | bigint                      |
>> zipcode                  | character varying(20)       |
>> xlocation_dim_id    | bigint                      |
>> location_srcid           | bigint                      |
>> timezone                 | real                        |
>> xtime_dim_id        | bigint                      |
>> Indexes:
>> "bigtab_stats_fact_tmp14_idx1" btree (customer_id)
>> "bigtab_stats_fact_tmp14_idx2" btree (show_id)
>> "bigtab_stats_fact_tmp14_idx3" btree (season_id)
>> "bigtab_stats_fact_tmp14_idx4" btree (episode_id)
>>
>>
>>
>>
>>
>>
>> # \d xsegment_dim
>> Table "public.xsegment_dim"
>> Column        |            Type              
>> |                          Modifiers
>> ----------------------+----------------------------- 
>> +-------------------------------------------------------------
>> xsegment_dim_id | bigint                      | not null default  
>> nextval('xsegment_dim_seq'::regclass)
>> customer_srcid       | bigint                      | not null
>> show_srcid           | bigint                      | not null
>> show_name            | character varying(500)      | not null
>> season_srcid         | bigint                      | not null
>> season_name          | character varying(500)      | not null
>> episode_srcid        | bigint                      | not null
>> episode_name         | character varying(500)      | not null
>> segment_type_id      | integer                     |
>> segment_type         | character varying(500)      |
>> segment_srcid        | bigint                      |
>> segment_name         | character varying(500)      |
>> effective_dt         | timestamp without time zone | not null  
>> default now()
>> inactive_dt          | timestamp without time zone |
>> last_update_dt       | timestamp without time zone | not null  
>> default now()
>> Indexes:
>> "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id)
>> "seg1" btree (customer_srcid)
>> "seg2" btree (show_srcid)
>> "seg3" btree (season_srcid)
>> "seg4" btree (episode_srcid)
>> "seg5" btree (segment_srcid)
>> "xsegment_dim_ix1" btree (customer_srcid)
>>
>>
>>
>>
>>
>>
>> -- 
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org 
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>


In response to

pgsql-performance by date

Next:From: Claus GuttesenDate: 2008-05-16 07:15:14
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Previous:From: kevin kempterDate: 2008-05-16 06:58:23
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group