Re: Join runs for > 10 hours and then fills up >1.3TB of disk space

From: kevin kempter <kevin(at)kevinkempterllc(dot)com>
To: kevin kempter <kevin(at)kevinkempterllc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Date: 2008-05-16 07:08:37
Message-ID: 18F4251F-5B1F-4A0D-A2EE-CA178F82F776@kevinkempterllc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Also, I'm running version 8.3 on a centOS box with 2 dual core CPU's
and 32Gig of ram

On May 16, 2008, at 12:58 AM, kevin kempter wrote:

> Sorry I goofed on the query text Here's the correct query:
>
> select
> f14.xpublisher_dim_id,
> f14.xtime_dim_id,
> f14.xlocation_dim_id,
> f14.xreferrer_dim_id,
> f14.xsite_dim_id,
> f14.xsystem_cfg_dim_id,
> f14.xaffiliate_dim_id,
> f14.customer_id,
> f14.pf_dts_id,
> f14.episode_id,
> f14.sessionid,
> f14.bytes_received,
> f14.bytes_transmitted,
> f14.total_played_time_sec,
> segdim.xsegment_dim_id as episode_level_segid
> from
> bigtab_stats_fact_tmp14 f14,
> xsegment_dim segdim
> where
> f14.customer_id = segdim.customer_srcid
> and f14.show_id = segdim.show_srcid
> and f14.season_id = segdim.season_srcid
> and f14.episode_id = segdim.episode_srcid
> and segdim.segment_srcid is NULL;
>
>
>
>
>
>
> On May 16, 2008, at 12:31 AM, kevin kempter wrote:
>
>> Hi List;
>>
>> I have a table with 9,961,914 rows in it (see the describe of
>> bigtab_stats_fact_tmp14 below)
>>
>> I also have a table with 7,785 rows in it (see the describe of
>> xsegment_dim below)
>>
>> I'm running the join shown below and it takes > 10 hours and
>> eventually runs out of disk space on a 1.4TB file system
>>
>> I've included below a describe of both tables, the join and an
>> explain plan, any help / suggestions would be much appreciated !
>>
>> I need to get this beast to run as quickly as possible (without
>> filling up my file system)
>>
>>
>> Thanks in advance...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> select
>> f14.xpublisher_dim_id,
>> f14.xtime_dim_id,
>> f14.xlocation_dim_id,
>> f14.xreferrer_dim_id,
>> f14.xsite_dim_id,
>> f14.xsystem_cfg_dim_id,
>> f14.xaffiliate_dim_id,
>> f14.customer_id,
>> pf_dts_id,
>> episode_id,
>> sessionid,
>> bytes_received,
>> bytes_transmitted,
>> total_played_time_sec,
>> segdim.xsegment_dim_id as episode_level_segid
>> from
>> bigtab_stats_fact_tmp14 f14,
>> xsegment_dim segdim
>> where
>> f14.customer_id = segdim.customer_srcid
>> and f14.show_id = segdim.show_srcid
>> and f14.season_id = segdim.season_srcid
>> and f14.episode_id = segdim.episode_srcid
>> and segdim.segment_srcid is NULL;
>>
>>
>>
>>
>>
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118)
>> Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND
>> (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =
>> f14.show_id) AND (segdim.season_srcid = f14.season_id))
>> -> Sort (cost=1570.35..1579.46 rows=3643 width=40)
>> Sort Key: segdim.episode_srcid, segdim.customer_srcid,
>> segdim.show_srcid, segdim.season_srcid
>> -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643
>> width=40)
>> Filter: (segment_srcid IS NULL)
>> -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126)
>> Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
>> -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74
>> rows=9961874 width=126)
>> (9 rows)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> # \d bigtab_stats_fact_tmp14
>> Table "public.bigtab_stats_fact_tmp14"
>> Column | Type | Modifiers
>> --------------------------+-----------------------------+-----------
>> pf_dts_id | bigint |
>> pf_device_id | bigint |
>> segment_id | bigint |
>> cdn_id | bigint |
>> collector_id | bigint |
>> digital_envoy_id | bigint |
>> maxmind_id | bigint |
>> quova_id | bigint |
>> website_id | bigint |
>> referrer_id | bigint |
>> affiliate_id | bigint |
>> custom_info_id | bigint |
>> start_dt | timestamp without time zone |
>> total_played_time_sec | numeric(18,5) |
>> bytes_received | bigint |
>> bytes_transmitted | bigint |
>> stall_count | integer |
>> stall_duration_sec | numeric(18,5) |
>> hiccup_count | integer |
>> hiccup_duration_sec | numeric(18,5) |
>> watched_duration_sec | numeric(18,5) |
>> rewatched_duration_sec | numeric(18,5) |
>> requested_start_position | numeric(18,5) |
>> requested_stop_position | numeric(18,5) |
>> post_position | numeric(18,5) |
>> is_vod | numeric(1,0) |
>> sessionid | bigint |
>> create_dt | timestamp without time zone |
>> segment_type_id | bigint |
>> customer_id | bigint |
>> content_publisher_id | bigint |
>> content_owner_id | bigint |
>> episode_id | bigint |
>> duration_sec | numeric(18,5) |
>> device_id | bigint |
>> os_id | bigint |
>> browser_id | bigint |
>> cpu_id | bigint |
>> xsystem_cfg_dim_id | bigint |
>> xreferrer_dim_id | bigint |
>> xaffiliate_dim_id | bigint |
>> xsite_dim_id | bigint |
>> xpublisher_dim_id | bigint |
>> season_id | bigint |
>> show_id | bigint |
>> xsegment_dim_id | bigint |
>> location_id | bigint |
>> zipcode | character varying(20) |
>> xlocation_dim_id | bigint |
>> location_srcid | bigint |
>> timezone | real |
>> xtime_dim_id | bigint |
>> Indexes:
>> "bigtab_stats_fact_tmp14_idx1" btree (customer_id)
>> "bigtab_stats_fact_tmp14_idx2" btree (show_id)
>> "bigtab_stats_fact_tmp14_idx3" btree (season_id)
>> "bigtab_stats_fact_tmp14_idx4" btree (episode_id)
>>
>>
>>
>>
>>
>>
>> # \d xsegment_dim
>> Table "public.xsegment_dim"
>> Column | Type
>> | Modifiers
>> ----------------------+-----------------------------
>> +-------------------------------------------------------------
>> xsegment_dim_id | bigint | not null default
>> nextval('xsegment_dim_seq'::regclass)
>> customer_srcid | bigint | not null
>> show_srcid | bigint | not null
>> show_name | character varying(500) | not null
>> season_srcid | bigint | not null
>> season_name | character varying(500) | not null
>> episode_srcid | bigint | not null
>> episode_name | character varying(500) | not null
>> segment_type_id | integer |
>> segment_type | character varying(500) |
>> segment_srcid | bigint |
>> segment_name | character varying(500) |
>> effective_dt | timestamp without time zone | not null
>> default now()
>> inactive_dt | timestamp without time zone |
>> last_update_dt | timestamp without time zone | not null
>> default now()
>> Indexes:
>> "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id)
>> "seg1" btree (customer_srcid)
>> "seg2" btree (show_srcid)
>> "seg3" btree (season_srcid)
>> "seg4" btree (episode_srcid)
>> "seg5" btree (segment_srcid)
>> "xsegment_dim_ix1" btree (customer_srcid)
>>
>>
>>
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Claus Guttesen 2008-05-16 07:15:14 Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Previous Message kevin kempter 2008-05-16 06:58:23 Re: Join runs for > 10 hours and then fills up >1.3TB of disk space