Re: shared_buffers/effective_cache_size on 96GB server

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Strahinja Kustudić <strahinjak(at)nordeus(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: shared_buffers/effective_cache_size on 96GB server
Date: 2012-10-10 14:54:19
Message-ID: 50758C1B.7010805@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/10/2012 09:49 AM, Strahinja Kustudić wrote:

> I will change those, but I don't think this is that big of an issue if
> most of the IO is done by Postgres, since Postgres has it's own
> mechanism to tell the OS to sync the data to disk. For example when it's
> writing a wal file, or when it's writing a check point, those do not get
> cached.

You'd be surprised. Greg Smith did a bunch of work a couple years back
that supported these changes. Most DBAs with heavily utilized systems
could even see this in action by turning on checkpoint logging, and
there's an occasional period where the sync time lags into the minutes
due to a synchronous IO switch.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

From pgsql-performance-owner(at)postgresql(dot)org Wed Oct 10 14:59:31 2012
Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29])
by malur.postgresql.org with esmtp (Exim 4.72)
(envelope-from <oleg(at)sai(dot)msu(dot)su>)
id 1TLxkx-00055q-Ep
for pgsql-performance(at)postgresql(dot)org; Wed, 10 Oct 2012 14:59:31 +0000
Received: from sn.sai.msu.ru ([93.180.26.215])
by magus.postgresql.org with esmtp (Exim 4.72)
(envelope-from <oleg(at)sai(dot)msu(dot)su>)
id 1TLxku-0004iE-II
for pgsql-performance(at)postgresql(dot)org; Wed, 10 Oct 2012 14:59:30 +0000
Received: from sn.sai.msu.ru (localhost [127.0.0.1])
by sn.sai.msu.ru (8.14.1/8.12.8) with ESMTP id q9AExItM001766;
Wed, 10 Oct 2012 18:59:18 +0400
Received: from localhost (megera(at)localhost)
by sn.sai.msu.ru (8.14.1/8.12.8/Submit) with ESMTP id q9AExIGU001763;
Wed, 10 Oct 2012 18:59:18 +0400
X-Authentication-Warning: sn.sai.msu.ru: megera owned process doing -bs
Date: Wed, 10 Oct 2012 18:59:18 +0400 (MSK)
From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
X-X-Sender: megera(at)sn(dot)sai(dot)msu(dot)ru
To: =?iso-8859-1?Q?Franois_Beausoleil?= <francois(at)teksol(dot)info>
cc: Shane Hathaway <shane(at)hathawaymix(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Ways to speed up ts_rank
In-Reply-To: <546C964C-5BE7-464D-9EC6-5FC816AEE65E(at)teksol(dot)info>
Message-ID: <Pine(dot)LNX(dot)4(dot)64(dot)1210101854460(dot)1008(at)sn(dot)sai(dot)msu(dot)ru>
References: <5074996C(dot)5020809(at)hathawaymix(dot)org> <546C964C-5BE7-464D-9EC6-5FC816AEE65E(at)teksol(dot)info>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Pg-Spam-Score: -1.9 (-)
X-Archive-Number: 201210/116
X-Sequence-Number: 48075

We'll present in Prague some improvements in FTS. Unfortunately, we have
only several minutes during lighting talk. In short, we improved GIN to
store additional information, coordinates for fts, for example and return
ordered by rank search results, which gave us performance better than
sphynx. It's just a prototype, but we already got median at 8 msec for
6 mln classifieds.

We didn't tested for long documents yet.

Regards,
Oleg

On Wed, 10 Oct 2012, Fran?ois Beausoleil wrote:

>
> Le 2012-10-09 ? 17:38, Shane Hathaway a ?crit :
>
>> Hello,
>>
>> The database has a text index of around 200,000 documents. Investigation revealed that text queries are slow only when using ts_rank or ts_rank_cd. Without a ts_rank function, any query is answered within 200ms or so; with ts_rank function, queries take up to 30 seconds. Deeper investigation using gprof showed that the problem is probably not ts_rank or ts_rank_cd, but the fact that those functions retrieve thousands of TOASTed tsvectors.
>
> Is the query perhaps doing something like this:
>
> SELECT ...
> FROM table
> WHERE tsvectorcol @@ plainto_tsquery('...')
> ORDER BY ts_rank(...)
>
> If so, ts_rank() is run for every document. What you should do instead is:
>
> SELECT *
> FROM (
> SELECT ...
> FROM table
> WHERE tsvectorcol @@ plainto_tsquery('...')) AS t1
> ORDER BY ts_rank(...)
>
> Notice the ts_rank() is on the outer query, which means it'll only run on the subset of documents which match the query. This is explicitly mentioned in the docs:
>
> """Ranking can be expensive since it requires consulting the tsvector of each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since practical queries often result in large numbers of matches."""
>
> (last paragraph of) http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING
>
> Hope that helps!
> Fran?ois Beausoleil
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2012-10-10 15:46:03 Re: Hyperthreading (was: Two identical systems, radically different performance)
Previous Message Strahinja Kustudić 2012-10-10 14:49:47 Re: shared_buffers/effective_cache_size on 96GB server