Re: robots.txt on git.postgresql.org

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: robots.txt on git.postgresql.org
Date: 2013-07-09 15:50:52
Message-ID: 51DC315C.4080806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 07/09/2013 11:24 AM, Greg Stark wrote:
> I note that git.postgresql.org's robot.txt refuses permission to crawl
> the git repository:
>
> http://git.postgresql.org/robots.txt
>
> User-agent: *
> Disallow: /
>
>
> I'm curious what motivates this. It's certainly useful to be able to
> search for commits. I frequently type git commit hashes into Google to
> find the commit in other projects. I think I've even done it in
> Postgres before and not had a problem. Maybe Google brought up github
> or something else.
>
> Fwiw the reason I noticed this is because I searched for "postgresql
> git log" and the first hit was for "see the commit that fixed the
> issue, with all the gory details" which linked to
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a6e0cd7b76c04acc8c8f868a3bcd0f9ff13e16c8
>
> This was indexed despite the robot.txt because it was linked to from
> elsewhere (Hence the interesting link title). There are ways to ask
> Google not to index pages if that's really what we're after but I
> don't see why we would be.

It's certainly not universal. For example, the only reason I found
buildfarm client commit d533edea5441115d40ffcd02bd97e64c4d5814d9, for
which the repo is housed at GitHub, is that Google has indexed the
buildfarm commits mailing list on pgfoundry. Do we have a robots.txt on
the postgres mailing list archives site?

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-07-09 15:56:27 Re: robots.txt on git.postgresql.org
Previous Message Fabien COELHO 2013-07-09 15:42:24 Re: Patch to add regression tests for SCHEMA