From: | Matt Price <matt(dot)price(at)utoronto(dot)ca> |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | Re: web archiving |
Date: | 2002-07-11 17:52:42 |
Message-ID: | 1026409963.17825.72.camel@anarres |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-novice |
Hi Phialip, et al,
well, wget is nice, and htdig/mngosearch both seem great; but I want to
be able to enter extra data about the web pages (author names, comments,
subject/key word entries...)so that the database starts to resemble a
bibliographic database. That is, I want other people to be able to take
advantage of work that I and other data-entry slaves do when we enter
the url's.
does htat seem silly?
matt
On Wed, 2002-07-10 at 18:21, Philip Hallstrom wrote:
> Not to discourage you from using postgresql or writing it yourself, but
> you might want to take a look at wget (for downloading the web pages) and
> mngosearch or htdig for searching them.
>
> mngosearch supports postgresql and has a PHP interface so you can have fun
> with that...
>
> On 10 Jul 2002, Matt Price wrote:
>
> > Hi there,
> >
> > I've just moved up from non-free os's to debian linux, and installed
> > postgresql, with the hope of getting started on some projects I've been
> > thinking about. Several of these projects involve web archives. The
> > idea is, a url is entered with a bunch of bibliographic-type data in
> > other fields (keywords, author, date, etc). The html (and hopefully,
> > accompanying images/css's/etc) are then grabbed using curl, and archived
> > in a postgresql database. A web or other gui interface then provides
> > fully-searchable access to the archive for later use.
> >
> > So my question: does anyone know of a similar tool which already
> > exists? I'm a complete novice at database programming (and at php, too,
> > which is what I figured I'd use as the scripting language, though I'd
> > consider learning perl or java if folks think that's a much better
> > idea), and I'd rather work with some pre-existing code than start from
> > the ground up. Any suggestings? Is this the right list to be asking
> > this quesiton on?
> >
> > Thanks loads,
> > Matt
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2002-07-11 18:34:04 | Re: workaround for lack of REPLACE() function |
Previous Message | Oliver Elphick | 2002-07-11 17:40:35 | Re: Type TEXT |
From | Date | Subject | |
---|---|---|---|
Next Message | Chad Thompson | 2002-07-11 17:56:50 | Re: views and rules |
Previous Message | Masaru Sugawara | 2002-07-11 16:57:33 | Re: views and rules |