A rapid search grepping mechanism that examines HTML elements by type and permits focused
Html Content / Article Extractor, web scrapping lib in Python
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
A library for extracting embedded metadata from HTML markup
A library for converting HTML into PDFs using ReportLab
The most feature-rich and easy-to-use library for processing XML and HTML in Python
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.
Module for automatic summarization of text documents and HTML pages.
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
hOCR is a format for representing OCR output, including layout information,character confidences, bounding boxes, and style information.
MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are replaced so that they display as the actual characters.
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
pywebview is a lightweight cross-platform wrapper around a webview component that allows to display HTML content in its own native GUI window.