A github actions + python code to extract URLs to code repositories to put into standard form, starting with github



first minimum viable product goal

The first minimum viable product goal will be to harvest from all the github repositories URLs such that the form returned is + “username” + “repository name” and then add them to the “repos” key in an existing JSON in a form like this: , which is summarized below:

    "memberOrgs": [
    "orgs": [
    "repos": [

2nd intermediate product goal

  • Fires from a GitHubAction

3rd intermediate product goal

Eventual product goal

  • works for public, internal, and private GitHub URLs
  • Works for GitHub, GitLab, BitBucket, and other code repository URLS & APIs
  • Keeps track of harvest date, source file name, source file URL & code platform & domain in an intermediate file.

Related Projects

This is referenced on an issue here:

Potential Useful Bits

regular expression (https:\/\/\/)\w+(\/)\w+ seems like a good starting point for the extraction of Github URLs.

GitHub Actions Structure Tentative:

  • download README file
  • replace old README file with new
  • extract all links matching a regular expression
  • sort & take out duplicates
  • make into JSON with domain, URL, org or username, repository name, source file name, source file link, and date of harvests
  • pull out org or username & repository name from above and put into appropriate key of the file JSON if not already there in either org or repo keys.

How to Integrate into ??????


  1. Put all of the code here into the repository:
  2. Call the code here from

If calling the code….

  • (1) add the script to read the README to as the first step
  • (2) set to be callled by GitHub actions
  • (3) when triggered the github actions does the entirity of the github actions in this repo, including calling the python scripts as its first step.
  • (4) latter steps include setting up the environnment and calling all the python scripts that the bash script calls. The code would need to be called by either a GitHub Action on (pull request, push, manual, or cron job) or by trigger after the call to refresh the


View Github