Top Six Engines for Intelligence Gathering
Introduction
Intelligence gathering can be the most underestimated phase in penetration testing despite its extreme importance. It requires patience and diligence, and the penetration tester needs to connect the dots of seemingly unrelated pieces of information to form a complete picture about the target; such detailed picture enables you to plot your attack strategy and initiate your later phases of exploitation. The idea is very similar to military attack, an attack won’t be launched before proper observation of the target has been made: who they are, what they do, how they behave, what their strengths and weaknesses are, and so on. Any successful attack depends on the accuracy of such gathered information.
In cyber security, the intelligence, or information, that we need to gather can fall under one of two categories: technical and personal. Technical information is related to the systems (software and hardware) within the scope of the attack, while personal information is related to the people who own or use such systems. Technical information provides the foundation for purely technical attacks and exploitation, while personal information facilitates a lot of the social engineering attacks. It is generally assumed that such information is harmless in itself; that is, it is not a vulnerability to have such information exposed. However, the danger comes from effectively utilizing that information during the attack phase.
There are different ways to gather intelligence. The two primary ways are passive and active. The passive method involves checking independent databases and third-party sources to gather information about the target; the target is not aware of such action and does not feel the presence of the attacker. On the other hand, the active method involves probing the target to solicit responses, and then, analyzing those responses for information. The target here can notice that they are being profiled. And it is the responsibility of the attacker to make their probes as sneaky or stealth as possible.
The following are six third-party engines, or resources, that can be checked to passively gather information, technical or personal, about your target. These should be your “friends” in any penetration testing engagement. You should know how to make the best out of them to gain sophisticated intelligence before the commencement of your attack plan.
1

Google – Search Engine
www.google.com
The Google Search Engine has become the most-used and most popular search engine on the web. However, aside from the normal use that almost everybody is familiar with, Google Search provides what are called advanced operators that refine the searches more and help find certain information not directly searchable using the normal search query.
These operators enable a hacker, or a pentester, to find files of a certain type containing sensitive information (like usernames, passwords, credit cards info, etc.), vulnerable web directories, vulnerable web services, or even unprotected webcams. The following table summarizes the most important advanced operators:
2

Shodan – IoT Search Engine
www.shodan.io
Shodan is a search engine for IoT, that is, for internet-connected devices. It mostly finds devices that have web servers (http/https on ports 80, 8080, 443, or 8443). It can also find devices listening on other ports like Telnet, SSH, SNMP, and many others. It was created by John Matherly in 2009 with the vision of searching for devices connected to the Internet.
Most often, IoT web servers provide interfaces and consoles for remote controlling and managing those devices. And the security risk happens when there is no proper access control or authentication system in place. This would give anyone the ability to monitor and control the device. Shodan can find devices like routers, switches, webcams, traffic lights, SCADA systems, heating systems, refrigerator, etc. Unlike Google Search, Shodan requires you to register a free account to get access to many of its features.
Under “Featured Categories,” Shodan provides three classes of IoT:
- Inudstrial Control Systems
“In a nutshell, Industrial control systems (ICS) are computers that control the world around you. They’re responsible for managing the air conditioning in your office, the turbines at a power plant, the lighting at the theatre or the robots at a factory.”
- Databases
These are technologies that provide a Relational Database Management System, like MySQL, PostgreSQL, mongoDB, Riak, Elastic, etc.
- Video Games
Shodan can find systems with online Video Games like Miecraft, Counter-Strike: Global Offensive, Starbound, ARK:Survival Evolved, etc.
3
Pipl – People Search
www.pipl.com
Pipl is the world’s largest people search engine. According to Piple Inc., “Pipl is the place to find the person behind the email address, social username or phone number. It polls information about a person from social media sources, such as Facebook, Linkedin, Twitter, and Google plus, and also from E-Commerce sites like Amazon.
When you access the site, you are presented with a textbox where you can enter the name of the person you would like to gather information about:
4

RobTex – DNS Lookup Engine
www.robtex.com
RobTex is an all-in-one DNS lookup engine. It is free, yet very comprehensive; it can poll publicly-available DNS and Whois information like IP addresses, domain names, host names, Autonomous Systems, routers, DNS records, IP geolocation, etc. And if you log in with an account, you will get access to the site’s history and you will be presented with a graph of the network topology of your queried domain. On the home page, you can see a search box where you can enter the hostname of your target as follows:
RebTex will return a huge report containing DNS analysis of different records (NS, MX, A, etc.), geographical locations of different servers, SEO data (if any), domain names that share the same NS, MX, and IP address, subdomains and hostnames, and finally a graph with a visual representation like the following:
5

BuiltWith – Web Technology Mining
www.builtwith.com
Founded in 2007 by Gary Brewer, Builtwith is an Australian-based company that provides different Internet services; particularly, it provides information and analysis on web technologies used by websites all over the world. You can query their database to find out which underlying technologies a certain website is built with.
The technologies that BuiltWith can provide fall into different categories, such as, Content Management Systems, JavaScript Libraries, Mobile Technologies, Content Delivery Networks, Widgets, Encoding technologies, Document technologies, technologies providing Aggregation Functionality, etc.
When you go their website, you are presented with a query box as follows:
After submitting your query, you will get a detailed report of all the technologies that come as parts of the website. Here is a section of the results returned for www.semurity.com website:
6

Netcraft – Web Analyzer
www.netcraft.com
Netcraft provides information and analysis of web servers. It has been exploring the web since 1995. The UK-based company Netcraft ltd., was founded in 1987 by Mike Prettejohn. It offers different web security services, such as webapp security testing and PCI security scanning. It also offers a powerful and sophisticated browser’s anti-phishing toolbar that is free of charge. The toolbar can work on Firefox, Chrome, and IE.
What makes Netcraft attractive to hackers and penetration testers is their Internet data mining engine. Their engine monitors the Internet for web services and collects information about web servers, operation systems, hosting companies, ISPs, and sometimes even uptimes. They keep all of that information in their own databases and tracks of all changes. That is, a user can see the history of the underlying technologies of a certain website.
A simple way to use the power of Netcraft, is to go the textbox under the title “What’s that site running?” and enter the the website address as shown the following image:
After submitting your query, you will get the results which will show different pieces of information about the website, such as, IP address, Nameserver, DNS admin, Hosting company, Hosting country, and so on. Also, you will see information about operating system and web server: