Searches a document for hash tags. Support multiple natural languages. Works in various contexts

Feb 15, 2022 1 min read

ht-getter

Searches a document for hash tags. Supports multiple natural languages. Works in various contexts.

This package uses a non-regex approach and supports both halfwidth and fullwidth alphanumeric characters as well as various writing systems.

Installation

pip install ht-getter

Function

def get_hash_tags(source, mode = “strings”)

Arguments:

source -> The source text to be searched. Must be passed as a str type.

mode -> Specifies the mode of the results. The default value is “strings”

mode = “strings” -> The results are returned as a list of strings

mode = “indices” -> The results are returned as a list of lists of the start and end indices of the hash tags.

Code Sample

from ht_getter.getter import get_hash_tags

source_text = '''This simple package helps find you find #hash_tags in various types of #documents#. It also works with other languages like #日本語 or #한국어.
It supports #ｆｕｌｌｗｉｄｔｈ #alpha-numeric characters. You can get a #list of the #hash_tags or a list of their #indices in the #####source_text."
'''

hash_tags = get_hash_tags(source_text)
hash_tag_indices = get_hash_tags(source_text, mode = "indices")

print(hash_tags)
print(hash_tag_indices)

Things to Keep in Mind:

This package can be used in various contexts. (Social media, news articles, etc.)
This package looks for substrings that have the structure of a hash tag but does not check that the substring is a valid hash tag on any platform.

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.