Thursday, January 22, 2009

Backing up a website to CD

If you want to archive off a dymaic site (for example, youve finished migrating from an old CMS to a new one and are ready to shutdown the old site) you can create a physical backup of all the generated static content using a tool like wget. wget can scan the entire website recursively, saving a copy of each html file and it's resources (images, css, javascript) to a local directory that you can then archive off to cd, dvd, etc. wget comes with just about every unix distro available, and you can get your hands on it for windows by installing cygwin (be sure to install the wget package in the "web" category). Once you've got wget installed and working, you can mirror a site using the following:


wget -erobots=off --wait 1 --html-extension --page-requisites --mirror --convert-links http://www.gnu.org


The trick forcing wget to ignore the robots.txt rules file at the root of most sites. By default, wget respects those rules which usually tells wget to skip a good portion of a pages resources needed for a complete site backup, like images used in page HTML design.

More info about wget can be found at http://wget.addictivecode.org/FrequentlyAskedQuestions

No comments: