Re: Large DB

From: "Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Large DB
Date: 2004-03-31 18:08:26
Message-ID: B295F3D09B0D3840BEA3B46C51FC389C1F5C07@pnlmse28.pnl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Thanks for the excellent feedback (all)!

Good point on the excess bytes/row, not sure how to explain that. There
have never been any deletes or updates on this table and all inserts
just simple inserts (no transactions or anything funky) so there
shouldn't be many aborted transactions (I could see the daemons that do
the inserts dying part way through a few times, but nothing to explain
the variance).

I haven't run ANALYZE on this table in a while. After about 50-60M rows
it didn't seem to change the query plan at all and since there were
never any deletes/updates it seemed like it wasn't making much/any
difference (should have been no pages to reclaim). That may be an
invalid assumption though.

I'll try the other suggestions over the next couple of days and see how
it goes. Thanks again.

Here is an explain on the query:

=> explain select point, avg(pvalue) as avg from tp3 where host in
('m563', 'm562', 'm561', 'm560', 'm559', 'm558', 'm557', 'm538', 'm755',
'm754', 'm753', 'm752', 'm751', 'm750', 'm749', 'm748') and starttime
between '2004-03-27 07:37:43' and '2004-03-30 07:40:08' group by point;

HashAggregate (cost=96.90..96.90 rows=1 width=25)
-> Index Scan using tp3_host_starttime, tp3_host_starttime,
tp3_host_starttime, tp3_host_starttime, tp3_host_starttime,
tp3_host_starttime, tp3_host_starttime, tp3_host_starttime,
tp3_host_starttime, tp3_host_starttime, tp3_host_starttime,
tp3_host_starttime, tp3_host_starttime, tp3_host_starttime,
tp3_host_starttime, tp3_host_starttime on tp3 (cost=0.00..96.90 rows=1
width=25)
Index Cond: (((host = 'm563'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm562'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm561'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm560'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm559'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm558'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm557'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm538'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm755'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm754'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm753'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm752'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm751'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm750'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)) OR ((host = 'm749'::bpchar) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone)) OR ((host =
'm748'::bpchar) AND (starttime >= '2004-03-27 07:37:43'::timestamp
without time zone) AND (starttime <= '2004-03-30 07:40:08'::timestamp
without time zone)))
Filter: (((host = 'm563'::bpchar) OR (host = 'm562'::bpchar) OR
(host = 'm561'::bpchar) OR (host = 'm560'::bpchar) OR (host =
'm559'::bpchar) OR (host = 'm558'::bpchar) OR (host = 'm557'::bpchar) OR
(host = 'm538'::bpchar) OR (host = 'm755'::bpchar) OR (host =
'm754'::bpchar) OR (host = 'm753'::bpchar) OR (host = 'm752'::bpchar) OR
(host = 'm751'::bpchar) OR (host = 'm750'::bpchar) OR (host =
'm749'::bpchar) OR (host = 'm748'::bpchar)) AND (starttime >=
'2004-03-27 07:37:43'::timestamp without time zone) AND (starttime <=
'2004-03-30 07:40:08'::timestamp without time zone))
(4 rows)

> -----Original Message-----
> From: Manfred Koizar [mailto:mkoi-pg(at)aon(dot)at]
> Sent: Wednesday, March 31, 2004 1:18 AM
> To: Mooney, Ryan
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Large DB
>
>
> On Tue, 30 Mar 2004 17:48:14 -0800, "Mooney, Ryan"
> <ryan(dot)mooney(at)pnl(dot)gov>
> wrote:
> >I have a single table that just went over 234GB in size with about
> >290M+ rows.
>
> That would mean ~ 800 bytes/row which, given your schema, is
> hard to believe unless there are lots of dead tuples lying around.
>
> >queries use the indexes fairly well, although I suspect that
> the order
> >of host/starttime is suboptimal (fewer hosts than starttime, and the
> >table is naturally in starttime order). I'm going to try adding an
> >index on just starttime (and later just host) and see if I
> can tune the
> >queries on that more.
>
> Yes, if you are ready to switch OS for a 10% performance
> gain, getting your indices right should be no question.
>
> > I never delete rows from the table, only do
> >inserts (up to around 11,000/minute mostly in one big burst every
> >minute, this is anticipated to go up some over time).
>
> How often do you ANALYSE?
>
> Have there been DELETEs or UPDATEs or aborted transactions in
> the past? Did you VACUUM or VACUUM FULL since then?
>
> > I'm only doing sub 15MB from the disk
> >array (from iostat) and I know it can do in the 40-60MB
> range when we
> >tested the raw speed,
>
> Sounds plausible for nonsequential I/O.
>
> >However the two indexes are also - large (which may be part of the
> >problem, which is why I'm trying just starttime for an
> index; They are
> >currently in the 140-150G range).
>
> This would be extreme index bloat which is only possible
> after massive DELETEs/UPDATEs.
>
> >stats=> explain select count(*) from tp3;
> > -> Seq Scan on tp3 (cost=0.00..6906493.16
> rows=290602016 width=0)
>
> The planner thinks that the table size is 4M pages, 32GB.
> The average tuple size of ~110 bytes (including tuple header)
> suits your schema quite nicely.
>
> > Table "public.tp3"
> > Column | Type | Modifiers
> >-------------+-----------------------------+-----------
> > host | character(4) |
> > point | character varying(64) |
> > type | character(1) |
> > cooked | character(1) |
> > starttime | timestamp without time zone |
> > intervallen | interval |
> > arrivetime | timestamp without time zone |
> > pvalue | numeric |
> >Indexes:
> > "tp3_host_starttime" btree (host, starttime, cooked)
> > "tp3_point_starttime" btree (point, starttime, cooked)
>
> In my experience any reduction in average tuple size results
> directly in a proportional increase of throughput for large
> tables. So here are some random thoughts:
>
> You said there are only a few hosts. So moving the hosts
> into a separate table with an integer primary key would save
> 4 bytes per row.
>
> Datatype "char" (with quotes) needs only 1 byte, char(1)
> needs 5 bytes, both before padding. Changing type and cooked
> from char(1) to "char" would save 12 bytes.
>
> And if you want to push it, you change hostid to smallint and
> rearrange the fields, saving 4 more padding bytes:
> hostid | smallint
> type | "char"
> cooked | "char"
>
> What about point? If there is a known small number of
> different values, move it into its own table.
>
> I'm not sure about the storage needs of numeric, might be at
> least 8 bytes. Consider using bigint. Someone please correct
> me if I'm wrong.
>
> Did you CREATE TABLE tp3 (...) WITHOUT OIDS?
>
> >Sample data mining query:
> >----------------------------
> >select point, avg(pvalue) as avg from tp3 where host in ('node',
> >'node',
> >....) and starttime between 'timestamp' and 'timestamp'
> group by point
>
> Show us EXPLAIN ANALYSE, please.
>
> >shared_buffers = 60800
>
> Looks a bit large to me. But if your tests have shown it to
> be the best value, it should be ok.
>
> >sort_mem = 1286720 # min 64, size in KB
>
> This is more than 1GB, I think this is too high.
>
> >fsync=false # Play fast and loose - whee
>
> How much did this help?
>
> >effective_cache_size = 160000
>
> Try more, say 320000 or even 400000.
>
> Servus
> Manfred
>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karl O. Pinc 2004-03-31 18:27:47 Re: Documentation clairification?, CHECK constraints
Previous Message Dann Corbit 2004-03-31 18:06:58 Re: Wich hardware suits best for large full-text indexed databases

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-03-31 18:25:45 Re: [HACKERS] Update on PITR
Previous Message elrik 2004-03-31 16:58:20 Re: what do postgresql with view ?