this program will find all the link with a spesfic Regex pattern from a site.
what it will do
in any site there are a lots of url that may you need the file behind them, this program will find all the
<a> tag, then list the
href of the tags. you can use Regex to find the special link(s)
all the finded url’s have some special charater’s, so the Regex pattern will try to match with all finded url, if match, the url will return. if not match, try for next url in the list.
if you do not write any pattern the program will print all link of site, defualt pattern is:
and the last thing is that the program is case-insensitive
how to use
--url 'the url of site' --pattern 'Regex pattern' --load-headers` path header file href.py --url 'URL' --pattern 'RegegPattern' --load-headers ./headers
href.py --url 'https://guitarmusic.ir/hayedeh-songs/' --pattern '.*mp3.*'
- all the switch have a small way to use
- use pipe
to use the program some time you need to pipe or redirect the result
some site repeated their link to preview a video or music before download them, so you can pipe the result to
uniqcommand for prevent link duplicate.
and for having the link in a text file, you should redirect the result to a file.
href.py -u "URL" -p "patternt" > links.txt
- run easy
to run the program witch oud cd to the source dir or wite the full path every time, you can link it to your
~/<user>/.local/bin/hrefdo it by this command:
ln -s href.py ~/.local/bin/href
and do not forget to make it executable
if you got any http status code try to use a the header of site, or use the default header in the directory by this switch