ThePhish

ThePhish is an automated phishing email analysis tool based on TheHive, Cortex and MISP. It is a web application written in Python 3 and based on Flask that automates the entire analysis process starting from the extraction of the observables from the header and the body of an email to the elaboration of a verdict which is final in most cases. In addition, it allows the analyst to intervene in the analysis process and obtain further details on the email being analyzed if necessary. In order to interact with TheHive and Cortex, it uses TheHive4py and Cortex4py, which are the Python API clients that allow using the REST APIs made available by TheHive and Cortex respectively.

Overview

The following diagram shows how ThePhish works at high-level:

  1. An attacker starts a phishing campaign and sends a phishing email to a user.
  2. A user who receives such an email can send that email as an attachment to the mailbox used by ThePhish.
  3. The analyst interacts with ThePhish and selects the email to analyze.
  4. ThePhish extracts all the observables from the email and creates a case on TheHive. The observables are analyzed thanks to Cortex and its analyzers.
  5. ThePhish calculates a verdict based on the verdicts of the analyzers.
  6. If the verdict is final, the case is closed and the user is notified. In addition, if it is a malicious email, the case is exported to MISP.
  7. If the verdict is not final, the analyst’s intervention is required. He must review the case on TheHive along with the results given by the various analyzers to formulate a verdict, then it can send the notification to the user, optionally export the case to MISP and close the case.

ThePhish example usage

This example aims to demonstrate how a user can send an email to ThePhish for it to be analyzed and how an analyst can actually analyze that email using ThePhish.

A user sends an email to ThePhish

A user can send an email to the email address used by ThePhish to fetch the emails to analyze. The email has to be forwarded as an attachment in EML format so as to prevent the contamination of the email header. In this case, the used mail client is Mozilla Thunderbird and the used email address is a Gmail address.

The analyst analyzes the email

The analyst navigates to the web page of ThePhish and clicks on the “List emails” button to obtain the list of emails to analyze.

When the analyst clicks on the “Analyze” button related to the selected email, the analysis is started and its progress is shown on the web interface.

In the meantime, ThePhish extracts the observables (URLs, domains, IP addresses, email addresses, attachments and hashes of those attachments) from the email and then interacts with TheHive to create the case.

Three tasks are created inside the case.

Then, ThePhish starts adding the extracted observables to the case.

At this point the user is notified via email that the analysis has started thanks to the Mailer responder.

The description of the first task allows the Mailer responder to send the notification via email.

After the first task is closed, the second task is started and the analyzers are started on the observables. The analysis progress is shown on the web interface while the analyzers are started.

The analysis progress can also be viewed on TheHive, thanks to its live stream.

Once all the analyzers have terminated their execution, the second task is closed and the third one is started, then ThePhish calculates the verdict. Since the verdict is “malicious”, all the observables that are found to be malicious are marked as IoC. In this case only one observable is marked as IoC.

The case is then exported to MISP as an event, with a single attribute represented by the observable mentioned above.

Then, ThePhish sends the verdict via email to the user thanks to the Mailer responder.

Finally, both the task and the case are closed. The description of the third task allows the Mailer responder to send the verdict via email. Moreover, the case has been closed after five minutes and resolved as “True Positive” with “No Impact”, which means that the attack has been detected before it could do any damage.

Once the case is closed, the verdict is available for the analyst on the web interface together with the entire log of the analysis progress.

At this point the analyst can go back and analyze another email. The above-depicted case was related to a phishing email, but a similar workflow can be observed when the analyzed email is classified as “safe”. Indeed, the case is closed and the verdict is sent via email to the user.

Then, the verdict is also displayed to the analyst on the web interface.

On the other hand, when an email is classified as “suspicious”, the verdict is only displayed to the analyst on the web interface.

At this point the analyst needs to use the buttons on the left-hand side of the page to use TheHive, Cortex and MISP for further analysis. This is because the analysis has not been completed yet and so the user is only notified that the analysis of the email that he forwarded to ThePhish has been started. Indeed, the last task and the case have not been closed yet since they need to be closed by the analyst himself once he elaborates a final verdict.

The analyst can view the reports of all the analyzers on TheHive and Cortex and, in case this revealed not to be enough, he could also download the EML file of the email and analyze it manually.

When the analyst terminates the analysis, he can populate the body of the email to send to the user in the description of the last task, start the Mailer responder, export the case to MISP if the verdict is “malicious” by clicking on the “Export” button and then close the case.

Implementation

ThePhish is a web application written in Python 3. The web server is implemented using Flask, while the front-end part of the application, which is the dynamic page written in HTML, CSS and JavaScript, is implemented using Bootstrap. Apart from the web server module, the back-end logic of the application is constituted by three Python modules that encapsulate the logic of the application itself and a Python class used to support the logging facility through the WebSocket protocol. If you want to see a graphical representation of the application logic, click here. Moreover, there are several configuration files used by the aforementioned modules that serve various purposes.

When the analyst navigates to the base URL of the application, the web page of ThePhish is loaded and a bi-directional connection is established with the server. This is done by using the Socket.IO JavaScript library in the web page that enables real-time, bi-directional and event-based communication between the browser and the server. This connection is established with a WebSocket connection whenever possible and will use HTTP long polling as a fallback. For this to work, the server application uses the Flask-SocketIO Python library, which provides a Socket.IO integration for Flask applications. This connection is then used by ThePhish to display the progress of the analysis on the web interface.

Every time the analyst performs an action on the web interface, an AJAX request is sent to the server, which is an asynchronous HTTP request that permits to exchange data with the server in the background and update the page without reloading it. This allows the analyst both to visualize the list of emails to analyze and to make the analysis start.

ThePhish interacts with TheHive and Cortex thanks to TheHive4py and Cortex4py. Moreover, it interacts with an IMAP server to retrieve the emails to analyze.

Installation

Install it using Docker and Docker Compose

Since the installation and configuration of TheHive, Cortex and MISP services from scratch for a production environment may not be extremely straightforward, TheHive Project provides Docker images and Docker Compose templates here to facilitate the installation procedure. For the sake of simplicity, the provided templates are made simple, without providing the full configuration options of each docker image.

If you only want to try ThePhish or you want to have it up and running as fast as possible, you can use the provided Docker Template in the docker folder, which is a modified version of one of the Docker Templates provided by TheHive Project that also allows creating a ThePhish container. To install ThePhish using Docker and Docker Compose, please refer to this guide. I strongly recommend that you install it this way at least the first time you use it so that you can learn the basics and how to configure it with a minimal configuration that should work on the first try. Indeed, the previously linked guide also provides a step-by-step procedure to configure the TheHive, Cortex and MISP instances.

Install it from scratch

This guide refers to the sole installation of ThePhish, which requires:

  • An up-and-running instance of TheHive
  • An up-and-running instance of Cortex
  • An up-and-running instance of MISP
  • An email address that users can use to send emails to ThePhish
  • A Linux-based OS with Python 3.8+ installed

In order to install, configure and integrate TheHive, Cortex and MISP instances, please refer to their official documentation:

It is advisable that the email address from which ThePhish fetches the emails to analyze be a Gmail address since it is the one with which ThePhish has been tested the most. It is preferable that the account is a newly created one, with the sole purpose of being used by ThePhish. Here is explained the procedure to activate the app password that is required by ThePhish to connect to the mailbox and fetch the emails.

Once TheHive, Cortex and MISP are configured and listening at a certain URL and the email address is ready to use, you can install and configure ThePhish.

  1. Clone the repository

    $ git clone https://github.com/emalderson/ThePhish.git
    
  2. Create a Python virtual environment and activate it (it is good practice but it is not required)

    $ cd ThePhish/app
    $ sudo apt install python3-venv
    $ python3 -m venv venv
    $ source venv/bin/activate
    
  3. Install the requirements

    $ pip install -r requirements.txt
    
  4. Add the run_responder() function to the file api.py of TheHive4py

    In order to send emails to the user, ThePhish uses the Mailer responder. Since ThePhish uses TheHive4py to interact with TheHive, a function that allows running a responder by its ID is needed. Unfortunately, this function is not part of TheHive4py yet, but a pull request has been made to add it to TheHive4py (#219). While waiting for it to be added, it must be manually added using the following command for ThePhish to work properly (replace the version of Python in the command if you use a different version of Python):

    <div class="highlight highlight-source-shell position-relative" data-snippet-clipboard-copy-content="$ (cat < /dev/null
    “>

    $ (cat << _EOF_
    
    
        def run_responder(self, responder_id, object_type, object_id):
            req = self.url + "/api/connector/cortex/action"
            try:
                data = json.dumps({ "responderId": responder_id, "objectType": object_type, "objectId": object_id})
                return requests.post(req, headers={"Content-Type": "application/json"}, data=data, proxies=self.proxies, auth=self.auth, verify=self.cert)
            except requests.exceptions.RequestException as e:
                raise TheHiveException("Responder run error: {}".format(e))
    _EOF_
    ) | tee -a venv/lib/python3.8/site-packages/thehive4py/api.py > /dev/null