PyAnchor

Dead links are an annoyance for websites with an extensive amount of content. A side from the negative impact on SEO, dead links are an annoyance for any user that clicks on one.

PyAnchor is primarily for checking the HTTP response on all links on a page. You can integrate it into your development workflow so that users never see a 404 in the first place.

Install

PyAnchor requires Python 3.6 and above.

MacOS / Linux:

$ python3 -m pip install pyanchor

Windows:

> python -m pip install pyanchor

Using the CLI

The CLI can be invoked with the pyanchor command. A URL must be provided.

Basic example for a single page:

> pyanchor https://mysite.com/

Note: all provided URLs must include a valid HTTP scheme.

If you want to check all links on a website, and not just a single page, a sitemap.xml URL may be
provided and flagged with --sitemap.

Example:

> pyanchor https://mysite.com/sitemap.xml --sitemap

By default, successful requests are not printed to the terminal. To see all urls with a 200
response add the --verbose flag.

> pyanchor https://mysite.com/sitemap.xml --sitemap --verbose

example-sitemap-verbose

But wait, there's more...

To integrate PyAnchor into your application, you can import the LinkResults class. LinkResults
requires a URL.

Example:

>>> from pyanchor.link_checker import LinkResults
>>> r = LinkResults("https://mysite.com/")
>>> r.results
{200: ["https://mysite.com/about/", "https://mysite.com/contact/"], 500: ["https://mysite.com/doh!/"]}

As you can see the results attribute is a dictionary containing all response codes returned as a
dictionary key, with a list of URLs that achieve that response code as the dictionary value.

PyAnchor give you the ability to use the LinkAnalysis class to check the links in a given URL for unsafe and obsolete attributes.

To check for obsolete attributes use the obsolete_attrs property:

>>> from pyanchor.link_checker import LinkAnalysis
>>> r = LinkAnalysis("https://mysite.com/")
>>> r.obsolete_attrs
{'/about/link-1': ['charset', 'rev'], '/about/link-2': ['name']}

Likewise you can check for unsafe linkes with unsafe_attrs:

>>> from pyanchor.link_checker import LinkAnalysis
>>> r = LinkAnalysis("https://mysite.com/")
>>> r.unsafe_attrs
{<a href="/about/link-4" target="_blank">Link 4</a>: True, <a href="/about/link-5" rel="noreferrer noopener" target="_blank">Link 5</a>: False}

Any link that does not include rel="noopener" when the target attribute is used will return True. As in, it is True that this link is unsafe. Therfore, links with appropriate attributes will return False.

GitHub

https://github.com/EndlessTrax/pyanchor