Quick Links

Re: From Simple to Complex

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: From Simple to Complex
Date:	2012-01-31 22:53:46
Message-ID:	CAAB3BBJwKV2UpzHFfXXMxLAfOx4QiCpJ3r1QE0BpQeLSeZ7PjQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Adding the indices means that I am now doing index scans instead of
seq scans but it doesn't seem to help with speed. Here are the new
EXPLAIN ANALYZE results:

"Sort (cost=174432.85..174433.58 rows=1460 width=94) (actual
time=73440.079..73440.088 rows=25 loops=1)"
" Sort Key: pg_class.relname, emotions.emotion"
" Sort Method: quicksort Memory: 20kB"
" -> HashAggregate (cost=174412.39..174417.50 rows=1460 width=94)
(actual time=73437.905..73437.940 rows=25 loops=1)"
" -> Merge Join (cost=27888.98..172032.86 rows=1586355
width=94) (actual time=65563.027..72763.848 rows=245917 loops=1)"
" Merge Cond: (emotions.moment_id = public.moments.moment_id)"
" -> Index Scan using emotions_moment_id_idx on emotions
(cost=0.00..135759.78 rows=4077358 width=55) (actual
time=1.283..43894.799 rows=3841095 loops=1)"
" -> Sort (cost=27888.98..28083.07 rows=388184
width=89) (actual time=16556.348..17384.537 rows=521025 loops=1)"
" Sort Key: public.moments.moment_id"
" Sort Method: quicksort Memory: 60865kB"
" -> Hash Join (cost=9.90..20681.81 rows=388184
width=89) (actual time=2.612..4309.131 rows=396594 loops=1)"
" Hash Cond: (public.moments.tableoid = pg_class.oid)"
" -> Append (cost=0.00..19216.22
rows=388184 width=29) (actual time=2.066..2851.885 rows=396594
loops=1)"
" -> Seq Scan on moments
(cost=0.00..0.00 rows=1 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
" Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
thoughts_inserted_idx on thoughts moments (cost=0.00..6146.96
rows=136903 width=29) (actual time=2.063..606.584 rows=130884
loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
photos_inserted_idx on photos moments (cost=0.00..4975.46 rows=109900
width=29) (actual time=1.542..836.063 rows=128286 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
music_inserted_idx on music moments (cost=0.00..3102.69 rows=40775
width=29) (actual time=0.756..308.031 rows=41176 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
people_inserted_idx on people moments (cost=0.00..4.07 rows=1
width=29) (actual time=0.015..0.015 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
places_inserted_idx on places moments (cost=0.00..4125.65 rows=96348
width=29) (actual time=0.066..263.853 rows=92756 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on videos
moments (cost=29.56..835.20 rows=3660 width=29) (actual
time=3.122..87.889 rows=3492 loops=1)"
" Recheck Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Bitmap Index Scan on
videos_inserted_idx (cost=0.00..29.37 rows=3660 width=0) (actual
time=0.696..0.696 rows=3492 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Seq Scan on facebook_people
moments (cost=0.00..1.04 rows=1 width=104) (actual time=0.040..0.040
rows=0 loops=1)"
" Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
address_people_inserted_idx on address_people moments
(cost=0.00..4.06 rows=1 width=29) (actual time=0.017..0.017 rows=0
loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
path_people_inserted_idx on path_people moments (cost=0.00..17.03
rows=593 width=29) (actual time=1.758..1.758 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
sleep_inserted_idx on sleep moments (cost=0.00..4.06 rows=1 width=29)
(actual time=0.012..0.012 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Hash (cost=8.88..8.88 rows=292
width=68) (actual time=0.520..0.520 rows=334 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 29kB"
" -> Seq Scan on pg_class
(cost=0.00..8.88 rows=292 width=68) (actual time=0.007..0.257 rows=334
loops=1)"
"Total runtime: 73511.072 ms"

Please let me know if there is any way to make this more efficient.

Thank you,
-Alessandro

On Tue, Jan 31, 2012 at 2:10 PM, Alessandro Gagliardi
<alessandro(at)path(dot)com> wrote:
>
> Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children."
> I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now; will report on what kind of difference it makes....
>
> On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro(at)path(dot)com> wrote:
>>
>> My slow query today is somewhat more complex than yesterday's, but I'm
>> hopeful it can be improved. Here's the query:
>>
>> SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
>> JOIN emotions USING (moment_id)
>> WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
>> GROUP BY relname, emotion ORDER BY relname, emotion;
>>
>> As you'll see below, moments is inherited by a number of other tables
>> and the purpose of relname is to see which one. Meanwhile, emotions
>> inherits feedback.
>>
>> Here's the Full Table and Index Schema:
>>
>> CREATE TABLE moments
>> (
>> moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
>> block_id character(24) NOT NULL,
>> inserted timestamp without time zone NOT NULL DEFAULT now(),
>> CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
>> )
>> WITH (
>> OIDS=FALSE
>> );
>>
>> CREATE INDEX moments_block_id_idx
>> ON moments
>> USING btree
>> (block_id );
>>
>> CREATE INDEX moments_inserted_idx
>> ON moments
>> USING btree
>> (inserted );
>>
>> CREATE TABLE feedback
>> (
>> feedback_id character(24) NOT NULL,
>> user_id character(24) NOT NULL,
>> moment_id character(24) NOT NULL,
>> created timestamp without time zone,
>> inserted timestamp without time zone NOT NULL DEFAULT now(),
>> lnglat point,
>> CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
>> )
>> WITH (
>> OIDS=FALSE
>> );
>>
>> CREATE INDEX feedback_lnglat_idx
>> ON feedback
>> USING gist
>> (lnglat );
>>
>> CREATE INDEX feedback_moment_id_idx
>> ON feedback
>> USING btree
>> (moment_id );
>>
>> CREATE TABLE emotions
>> (
>> -- Inherited from table feedback: feedback_id character(24) NOT NULL,
>> -- Inherited from table feedback: user_id character(24) NOT NULL,
>> -- Inherited from table feedback: moment_id character(24) NOT NULL,
>> -- Inherited from table feedback: created timestamp without time zone,
>> -- Inherited from table feedback: inserted timestamp without time
>> zone NOT NULL DEFAULT now(),
>> emotion character varying NOT NULL,
>> -- Inherited from table : lnglat point,
>> CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
>> )
>> INHERITS (feedback)
>> WITH (
>> OIDS=FALSE
>> );
>>
>> CREATE INDEX emotions_emotion_idx
>> ON emotions
>> USING btree
>> (emotion );
>>
>> Here's the results from EXPLAIN ANALYZE:
>>
>> "Sort (cost=309717.70..309718.43 rows=1460 width=94) (actual
>> time=60462.534..60462.544 rows=25 loops=1)"
>> " Sort Key: pg_class.relname, emotions.emotion"
>> " Sort Method: quicksort Memory: 20kB"
>> " -> HashAggregate (cost=309697.24..309702.35 rows=1460 width=94)
>> (actual time=60462.457..60462.476 rows=25 loops=1)"
>> " -> Hash Join (cost=133154.62..308963.70 rows=489024
>> width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
>> " Hash Cond: (public.moments.tableoid = pg_class.oid)"
>> " -> Hash Join (cost=133144.72..307119.96 rows=489024
>> width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
>> " Hash Cond: (public.moments.moment_id = emotions.moment_id)"
>> " -> Append (cost=0.00..114981.64 rows=119665
>> width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
>> " -> Seq Scan on moments (cost=0.00..0.00
>> rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on thoughts moments
>> (cost=0.00..38856.88 rows=44388 width=29) (actual
>> time=883.150..9040.959 rows=115436 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on photos moments
>> (cost=0.00..29635.78 rows=194 width=29) (actual
>> time=5329.700..5827.447 rows=116420 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on music moments
>> (cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
>> rows=37248 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on people moments
>> (cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
>> rows=0 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on places moments
>> (cost=0.00..24551.03 rows=54961 width=29) (actual
>> time=5224.044..5324.517 rows=85564 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on videos moments
>> (cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
>> rows=2897 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on facebook_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
>> loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on address_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
>> loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on path_people moments
>> (cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
>> rows=0 loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Seq Scan on sleep moments
>> (cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
>> loops=1)"
>> " Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> " -> Hash (cost=79292.49..79292.49 rows=4059496
>> width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
>> " Buckets: 262144 Batches: 4 Memory Usage: 75211kB"
>> " -> Seq Scan on emotions
>> (cost=0.00..79292.49 rows=4059496 width=55) (actual
>> time=0.012..15969.981 rows=4058642 loops=1)"
>> " -> Hash (cost=8.88..8.88 rows=292 width=68) (actual
>> time=0.487..0.487 rows=319 loops=1)"
>> " Buckets: 1024 Batches: 1 Memory Usage: 28kB"
>> " -> Seq Scan on pg_class (cost=0.00..8.88
>> rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
>> "Total runtime: 60601.612 ms"
>>
>> Postgres version: is still 9.0.5
>>
>> History: N/A (This is the first time I've run this query.)
>>
>> Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
>> instance of a Heroku Postgres database.
>>
>> Maintenance Setup: What Heroku does. As before, vacuum should not be
>> relevant as there are no deletes or even updates (just inserts and
>> selects)
>>
>> WAL Configuration: I still don't know. Heroku hosts the database on
>> Amazon's servers, so maybe that answers the question?
>>
>> GUC Settings: As per the yesterday's discussion, I reduced
>> random_page_cost to 2. Other than that, it's all default.
>>
>> Bonus question: If that was too simple, here's something even more
>> complex I'd like to do: I have another table that inherits feedback
>> called "comments". Ideally, rather than an "emotion" column coming
>> out, I would like to have a "feedback_type" column that would be
>> either the value in the emotion column of the emotions table, or
>> "comment" if it's from the comments table. I'm thinking I'm going to
>> have to simply put that together on the client, but if I can do that
>> in a single query (that doesn't take an hour to run) that would be
>> super cool. But that's definitely secondary compared to getting the
>> above query to run faster.
>>
>> Thank you very much for any help!
>> -Alessandro Gagliardi
>
>

In response to

Re: From Simple to Complex at 2012-01-31 22:10:11 from Alessandro Gagliardi

Responses

Re: From Simple to Complex at 2012-01-31 23:43:10 from Alessandro Gagliardi

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alessandro Gagliardi	2012-01-31 23:43:10	Re: From Simple to Complex
Previous Message	Alessandro Gagliardi	2012-01-31 22:10:11	Re: From Simple to Complex