mail list traffic

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: mail list traffic
Date: 2008-11-23 18:04:37
Message-ID: 391ffd2c-0ef2-4edb-bf3f-44b0bec7ccd6@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gregory Stark wrote:

> I would be curious to see the average lifespan of threads over time.

I happen to have the mail archives stored in a database, so I've
expressed this in SQL and below are some results for hackers and
general, 2007-2008. count is the number of distinct threads whose
oldest message is in the specified month. A thread is started as
soon as a message has an In-Reply-To field pointing to an
existing Message-Id.

Results for pgsql-hackers:

month | avg span | median span | count
---------+------------------+-----------------+-------
2007-01 | 7 days 10:00:00 | 1 day 04:18:00 | 211
2007-02 | 7 days 10:00:00 | 1 day 00:23:48 | 186
2007-03 | 16 days 30:00:00 | 1 day 05:45:37 | 171
2007-04 | 13 days 26:00:00 | 19:07:00 | 142
2007-05 | 19 days 30:00:00 | 1 day 04:46:36 | 122
2007-06 | 15 days 19:00:00 | 23:38:13 | 111
2007-07 | 19 days 25:00:00 | 21:04:04 | 106
2007-08 | 13 days 30:00:00 | 20:26:39 | 133
2007-09 | 21 days 32:00:00 | 1 day 16:43:10 | 121
2007-10 | 13 days 19:00:00 | 17:23:24 | 148
2007-11 | 16 days 15:00:00 | 16:23:00 | 140
2007-12 | 17 days 16:00:00 | 1 day 07:28:05 | 81
2008-01 | 13 days 12:00:00 | 23:02:33 | 127
2008-02 | 9 days 11:00:00 | 12:44:28 | 130
2008-03 | 10 days 14:00:00 | 22:57:18 | 140
2008-04 | 10 days 14:00:00 | 1 day 00:32:34 | 132
2008-05 | 13 days 09:00:00 | 1 day 20:57:57 | 113
2008-06 | 7 days 27:00:00 | 1 day 05:42:46 | 102
2008-07 | 13 days 26:00:00 | 2 days 07:43:34 | 133
2008-08 | 9 days 33:00:00 | 1 day 07:47:09 | 121
2008-09 | 7 days 25:00:00 | 1 day 19:00:50 | 125
2008-10 | 6 days 14:00:00 | 1 day 10:31:01 | 178

Results for pgsql-general:

month | avg span | median span | count
---------+-----------------+-------------+-------
2007-01 | 1 day 25:00:00 | 10:57:11 | 329
2007-02 | 2 days 28:00:00 | 10:50:38 | 295
2007-03 | 3 days 08:00:00 | 14:54:08 | 310
2007-04 | 6 days 18:00:00 | 17:40:55 | 244
2007-05 | 3 days 22:00:00 | 16:43:54 | 287
2007-06 | 2 days 13:00:00 | 11:26:46 | 297
2007-07 | 2 days 19:00:00 | 11:59:40 | 263
2007-08 | 3 days 14:00:00 | 16:35:16 | 335
2007-09 | 3 days 14:00:00 | 13:23:09 | 245
2007-10 | 2 days 16:00:00 | 08:46:09 | 302
2007-11 | 3 days 07:00:00 | 08:28:06 | 294
2007-12 | 2 days 31:00:00 | 10:25:14 | 255
2008-01 | 2 days 14:00:00 | 13:23:12 | 248
2008-02 | 2 days 14:00:00 | 10:02:16 | 257
2008-03 | 1 day 25:00:00 | 13:20:06 | 245
2008-04 | 1 day 30:00:00 | 08:26:06 | 238
2008-05 | 3 days 22:00:00 | 18:58:27 | 211
2008-06 | 2 days 24:00:00 | 14:46:02 | 191
2008-07 | 1 day 29:00:00 | 10:37:17 | 221
2008-08 | 1 day 22:00:00 | 14:14:45 | 205
2008-09 | 1 day 24:00:00 | 14:26:26 | 202
2008-10 | 1 day 19:00:00 | 12:32:56 | 219

"median span" is the median computed with the pl/R median function
applied to intervals as a number of seconds and then cast back to
intervals for display. I believe the median is good to mitigate the
contribution of messages with wrong dates and posters that reply to
very old messages. And median span appears to differs a lot from the
average span.

If people feel like playing with the database to build other queries,
feel free to bug me off-list about it. I can arrange to make a dump
available or share the scripts to build it yourself from the
mailboxes archives.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scara Maccai 2008-11-23 18:10:36 Re: Using Postgres to store high volume streams of sensor readings
Previous Message Ciprian Dorin Craciun 2008-11-23 17:36:14 Re: Using Postgres to store high volume streams of sensor readings