theArchivist 2.1x (b10) review
DownloadtheArchivist is a web-crawler for downloading and archiving web content, rewriting absolute links as it goes.
|
|
theArchivist is a web-crawler for downloading and archiving web content, rewriting absolute links as it goes. Variable "stopping rules" make sure you don't try to archive the whole web!
Starting with a particular URL it retrieves the web page, scans it for links, and then attempts to retrieve all files linked to the page.
This behavior repeats for each file retrieved and continues until one of several stop criteria is reached.
If desired, the application will rewrite absolute URLs relative to the download hierarchy, producing a completely self-sufficient archive.
What's New:
Corrected parsing of javascript function references,
added php and asp to recognized "html" file extensions,
fixed bug that truncated crawls done without the "legal servers" restriction.
theArchivist 2.1x (b10) keywords