/ Web Crawling & Web Scraping

A lightning fast web crawler which extracts URLs & endpoints from a target

A lightning fast web crawler which extracts URLs & endpoints from a target

Photon

Photon is a lightning fast web crawler which extracts URLs, files, intel & endpoints from a target.

Main Features

Data Extraction

Photon extracts the following data while crawling by default:

  • URLs (in-scope & out-of-scope)
  • URLs with parameters (example.com/gallery.php?id=2)
  • Intel (emails, social media accounts, amazon buckets etc.)
  • Files (pdf, png, xml etc.)
  • JavaScript files & Endpoints present in them
  • Strings based on custom regex pattern

The extracted information is saved in an organized manner.
save demo

Photon also allows custom data extraction with regex patterns.

Intelligent Multithreading

Here's a secret, most of the tools floating on the internet aren't properly multi-threaded even if they are supposed to. They either supply a list of items to threads which results in multiple threads accessing the same item or they simply put a thread lock and end up rendering multi-threading useless.
But Photon is different or should I say "genius"? Take a look at this and decide yourself.

Ninja Mode

In Ninja Mode, 3 online services are used to make requests to the target on your behalf.
So basically, now you have 4 clients making requests to the same server simultaneously which gives you a speed boost, minimizes the risk of connection reset as well as delays requests from a single client.
Here's a comparison generated by Quark where the lines represent threads:

ninja demo

Plugins

Photon's capabilites can be further extended by the use of plugins.
Plugins Available:

  • dnsdumpster: Generates an image containing the DNS data of the target doman.

Plugins in active development:

  • Quark: A plugin to plot a graph making it easier to inspect relationships between different webpages using Quark.
  • dnsdumpster: A new version of the plugin is in development which will save the DNS data in a nicely formatted HTML file.

GitHub