How To Clone A Website By Kali Linux

What is HTTrack

HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

HTTrack allows users to download World Wide Web sites from the Internet to a local computer.By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs.

HTTrack uses a Web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.

HTTrack is just the tool for doing that.

HTTrack takes any website and makes a copy to your hard drive. This can be useful for searching for data on the website offline such as email addresses, information useful for social engineering, hidden password files (believe me, I have found a few), intellectual property, or maybe replicating a login page for a Evil Twin site to capture login credentials.

Unfortunately, HTTrack is not included in Kali, so we will need to download and install it. Fortunately, though, it is included in the Kali repository, so all we need to do is open the software repository and download and install it.

HTTrack comes in both a Windows and a Linux version. For those of you who refuse to take off the training wheels, you can download and install HTTrack for Windows on its website.

Step 1 Download & Install HTTrack



From Kali, we need to navigate to "System Tools" and then "Add/Remove Software," like in the screenshot below.


That will open a screen like the one below. Notice the window in the upper left-hand corner next to the "Find" button. Enter "httrack" there and it will find the packages you need to install HTTrack.

You can also install it by typing the following in a terminal.

kali > apt-get install httrack


Step 2 Use HTTrack

Now that we have installed HTTrack, let's start by looking at the help file for HTTrack. When you downloaded and installed HTTrack, it placed it in the /usr/bin directory, so it should be accessible from any directory in Kali as /usr/bin is in the PATH variable. Let's type:

kali > httrack --help


I've highlighted the key syntax line in the screenshot above. The basic syntax is the following, where -O stands for "output." This switch tells HTTrack where to send the website to.

kali > httrack <the URL of the site> [any options] URL Filter -O <location to send copy to>


Using HTTrack is fairly simple. We need only point it at the website we want to copy and then direct the output (-O) to a directory on our hard drive where we want to store the website. One caution here, though. Some sites are HUGE. If you tried to copy Facebook to your hard drive, I can guarantee you that you do not have enough drive space, so start small.


Step 3 Test HTTrack

In an earlier tutorial on hacking MySQL databases behind websites (MySQL is the most widely used database backend behind websites), we used a website that we could hack with impunity called webscantest.com. Let's try to make a copy of that site to our hard drive.

kali > httrack http://www.webscantest.com -O /tmp/webscantest

As you can see, we successfully made a copy of all the pages of this site on our hard drive.


Step 4 Explore the Site Copy

Now that we have captured and copied the entire site to our hard drive, let's take a look at it.

We can open the IceWeasel browser (or any browser) and view the contents of our copied site to the location on our hard drive. Since we copied the web site to /tmp/webscantest, we simply point our browser there and can view all the content of the website! If we point it to /tmp/webscantest/www.webscantest.com/login.html, we can see that we have an exact copy of the login page!


Hmmm...what could we possibly use that for???

Step 5 Copy Our Favorite Web Site

Now, let's try HTTrack on our favorite website, wonderhowto.com. Let's try to make a copy of a forum post I wrote last week about the CryptoLocker hack. First, let's open that page right here and copy the address into Kali after the HTTrack command and then the location where you want send the copy to.

kali > httrack https://example.com/ -O /tmp/example

You can send the copied website to any location, but I sent mine to /tmp/example. and store an exact copy of it on your hard drive. Notice it also tells us that it is 208 bytes.

If you are trying to find information about a particular company for social engineering or trying to spoof a website or login, HTTrack is an excellent tool for both tasks.