PyPIContents

PyPIContents is an application that generates a Module Index from the Python Package Index (PyPI) and also from various versions of the Python Standard Library.

PyPIContents generates a configurable index written in JSON format that serves as a database for applications like pipsalabim. It can be configured to process only a range of packages (by initial letter) and to have memory, time or log size limits. It basically aims to mimic what the Contents file means for a Debian based package repository, but for the Python Package Index.

This repository stores the application in the master branch. It also stores a Module Index in the contents branch that is updated daily through a Travis cron. Read below for more information on how to use one or the other.

Getting started

Installation

The pypicontents program is written in python and hosted on PyPI. Therefore, you can use pip to install the stable version:

$ pip install --upgrade pypicontents

If you want to install the development version (not recomended), you can install directlty from GitHub like this:

$ pip install --upgrade https://github.com/CollageLabs/pypicontents/archive/master.tar.gz

Using the application

PyPIContents is divided in several commands.

pypicontents pypi

This command generates a JSON module index with information from PyPI. Read below for more information on how to use it:

$ pypicontents pypi --help

usage: pypicontents pypi [options]

General Options:
  -V, --version         Print version and exit.
  -h, --help            Show this help message and exit.

Pypi Options:
  -l <level>, --loglevel <level>
                        Logger verbosity level (default: INFO). Must be one
                        of: DEBUG, INFO, WARNING, ERROR or CRITICAL.
  -f <path>, --logfile <path>
                        A path pointing to a file to be used to store logs.
  -o <path>, --outputfile <path>
                        A path pointing to a file that will be used to store
                        the JSON Module Index (required).
  -R <letter/number>, --letter-range <letter/number>
                        An expression representing an alphanumeric range to be
                        used to filter packages from PyPI (default: 0-z). You
                        can use a single alphanumeric character like "0" to
                        process only packages beginning with "0". You can use
                        commas use as a list o dashes to use as an interval.
  -L <size>, --limit-log-size <size>
                        Stop processing if log size exceeds <size> (default:
                        3M).
  -M <size>, --limit-mem <size>
                        Stop processing if process memory exceeds <size>
                        (default: 2G).
  -T <sec>, --limit-time <sec>
                        Stop processing if process time exceeds <sec>
                        (default: 2100).

pypicontents stdlib

This command generates a JSON Module Index from the Python Standard Library. Read below for more information on how to use it:

$ pypicontents stdlib --help

usage: pypicontents stdlib [options]

General Options:
  -V, --version         Print version and exit.
  -h, --help            Show this help message and exit.

Stdlib Options:
  -o <path>, --outputfile <path>
                        A path pointing to a file that will be used to store
                        the JSON Module Index (required).
  -p <version>, --pyver <version>
                        Python version to be used for the Standard Library
                        (default: 2.7).

pypicontents stats

This command gathers statistics from the logs generated by the pypi command. Read below for more information on how to use it:

$ pypicontents stats --help

usage: pypicontents stats [options]

General Options:
  -V, --version         Print version and exit.
  -h, --help            Show this help message and exit.

Stats Options:
  -i <path>, --inputdir <path>
                        A path pointing to a directory containing JSON files
                        generated by the pypi command (required).
  -o <path>, --outputfile <path>
                        A path pointing to a file that will be used to store
                        the statistics (required).

pypicontents errors

This command summarizes errors found in the logs generated by the pypi command. Read below for more information on how to use it:

$ pypicontents errors --help

usage: pypicontents errors [options]

General Options:
  -V, --version         Print version and exit.
  -h, --help            Show this help message and exit.

Errors Options:
  -i <path>, --inputdir <path>
                        A path pointing to a directory containing JSON files
                        generated by the pypi command (required).
  -o <path>, --outputfile <path>
                        A path pointing to a file that will be used to store
                        the errors (required).

pypicontents merge

This command searches for JSON files generated by the pypi or stdlib commands and combines them into one. Read below for more information on how to use it:

$ pypicontents merge --help

usage: pypicontents merge [options]

General Options:
  -V, --version         Print version and exit.
  -h, --help            Show this help message and exit.

Merge Options:
  -i <path>, --inputdir <path>
                        A path pointing to a directory containing JSON files
                        generated by pypi or stdlib commands (required).
  -o <path>, --outputfile <path>
                        A path pointing to a file that will be used to store
                        the merged JSON files (required).

About the Module Index

In the pypi.json file (located in the contents branch) you will find a dictionary with all the packages registered at the main PyPI instance, each one with the following information:

{
    "pkg_a": {
        "version": [
            "X.Y.Z"
        ],
        "modules": [
            "module_1",
            "module_2",
            "..."
        ],
        "cmdline": [
            "path_1",
            "path_2",
            "..."
        ]
    },
    "pkg_b": {
         "...": "..."
    },
    "...": {},
    "...": {}
}

This index is generated using Travis. This is done by executing the setup.py file of each package through a monkeypatch that allows us to read the parameters that were passed to setup(). Check out pypicontents/api/process.py for more info.

Use cases

  • Search which package (or packages) contain a python module. Useful to determine a project's requirements.txt or install_requires.

    import json
    import urllib2
    from pprint import pprint

    pypic = 'https://raw.githubusercontent.com/CollageLabs/pypicontents/contents/pypi.json'

    f = urllib2.urlopen(pypic)
    pypicontents = json.loads(f.read())

    def find_package(contents, module):
    for pkg, data in contents.items():
    for mod in data['modules']:
    if mod == module:
    yield {pkg: data['modules']}

    Which package(s) content the 'django' module?

    Output:

    pprint(list(find_package(pypicontents, 'django')))