Downloading an entire website or part of it
A useful method for getting a website you need within code or from the command line.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains example.com \
--no-parent \
http://www.example.com/examples/
This wget command downloads the website www.example.com/examples/ in it’s entirety following hyperlinks recursively.
What it all means:
- –recursive
- download the entire website.
- –domains example.com
- don’t follow links outside example.com.
- –no-parent
- don’t follow links above or outside the directory examples/.
- –page-requisites
- get all the elements that compose the page (images, CSS and so on).
- –html-extension
- save files with the .html extension.
- –convert-links
- convert links so that they work locally, off-line.
- –restrict-file-names=windows
- modify filenames so that they will work in Windows as well.
- –no-clobber
- don’t overwrite any existing files (used in case the download is interrupted and resumed).
Wget is available for most operating systems, pre-installed in most versions of Linux and can be downloaded as a binary for windows too.