incomplete headers: archives.postgresql.org

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: pgsql-www(at)postgresql(dot)org
Subject: incomplete headers: archives.postgresql.org
Date: 2004-01-13 17:52:52
Message-ID: Pine.GSO.4.58.0401132035460.9616@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Hi there,

crawling of archives.postgresql.org is a pain, because there are no
last-modified information in headers and crawler have to download message
again. For example:

megera(at)mira:~$ curl -I http://archives.postgresql.org/pgsql-hackers/2004-01/msg00282.php
HTTP/1.1 200 OK
Date: Tue, 13 Jan 2004 17:38:26 GMT
Server: Apache/1.3.28 (Unix) PHP/4.3.3RC1
X-Powered-By: PHP/4.3.3RC1
Content-Type: text/html

Is't possible to add, at least, header 'Last-Modified', so crawler could
understand if this page should be downloaded again ? It'll save bandwidth
and time to crawle. I think the best way to set 'Last-Modified' header
to date of message from 'Date:' field. Of course, there are should be
proof from 'bad clocks', so default time may be arrival time.

Also, it could be useful to add 'Expires' header.
I think, headers should be added only to pages with individual message, not
to indexes, because index pages are indeed changed.

I don't think it's very difficult, but it help site and people.

btw, I use cacheability to check if page could cached:
http://www.sai.msu.su/admin/cacheability/?query=http%3A%2F%2Farchives.postgresql.org%2Fpgsql-hackers%2F2004-01%2Fmsg00282.php&descend=on

http://archives.postgresql.org/pgsql-hackers/2004-01/msg00282.php
Expires -
Cache-Control -
Last-Modified -
ETag -
Content-Length - (actual size: 13277)
Server Apache/1.3.28 (Unix) PHP/4.3.3RC1

This object will be considered stale, because it doesn't have any freshness
information assigned. It doesn't have a validator present. It doesn't have a Content-Length header present, so it can't be used in a HTTP/1.0 persistent connection.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Marc G. Fournier 2004-01-13 19:49:00 Re: incomplete headers: archives.postgresql.org
Previous Message Dave Page 2004-01-13 13:17:19 Re: Fwd: New PostgreSQL Br web address