PyPIContents
PyPIContents is an application that generates a Module Index from the Python Package Index (PyPI) and also from various versions of the Python Standard Library.
PyPIContents generates a configurable index written in JSON format that serves as a database for applications like pipsalabim. It can be configured to process only a range of packages (by initial letter) and to have memory, time or log size limits. It basically aims to mimic what the Contents file means for a Debian based package repository, but for the Python Package Index.
This repository stores the application in the master branch. It also stores a Module Index in the contents branch that is updated daily through a Travis cron. Read below for more information on how to use one or the other.
Getting started
Installation
The pypicontents
program is written in python and hosted on PyPI. Therefore, you can use pip to install the stable version:
$ pip install --upgrade pypicontents
If you want to install the development version (not recomended), you can install directlty from GitHub like this:
$ pip install --upgrade https://github.com/CollageLabs/pypicontents/archive/master.tar.gz
Using the application
PyPIContents is divided in several commands.
pypicontents pypi
This command generates a JSON module index with information from PyPI. Read below for more information on how to use it:
$ pypicontents pypi --help
usage: pypicontents pypi [options]
General Options:
-V, --version Print version and exit.
-h, --help Show this help message and exit.
Pypi Options:
-l <level>, --loglevel <level>
Logger verbosity level (default: INFO). Must be one
of: DEBUG, INFO, WARNING, ERROR or CRITICAL.
-f <path>, --logfile <path>
A path pointing to a file to be used to store logs.
-o <path>, --outputfile <path>
A path pointing to a file that will be used to store
the JSON Module Index (required).
-R <letter/number>, --letter-range <letter/number>
An expression representing an alphanumeric range to be
used to filter packages from PyPI (default: 0-z). You
can use a single alphanumeric character like "0" to
process only packages beginning with "0". You can use
commas use as a list o dashes to use as an interval.
-L <size>, --limit-log-size <size>
Stop processing if log size exceeds <size> (default:
3M).
-M <size>, --limit-mem <size>
Stop processing if process memory exceeds <size>
(default: 2G).
-T <sec>, --limit-time <sec>
Stop processing if process time exceeds <sec>
(default: 2100).
pypicontents stdlib
This command generates a JSON Module Index from the Python Standard Library. Read below for more information on how to use it:
$ pypicontents stdlib --help
usage: pypicontents stdlib [options]
General Options:
-V, --version Print version and exit.
-h, --help Show this help message and exit.
Stdlib Options:
-o <path>, --outputfile <path>
A path pointing to a file that will be used to store
the JSON Module Index (required).
-p <version>, --pyver <version>
Python version to be used for the Standard Library
(default: 2.7).
pypicontents stats
This command gathers statistics from the logs generated by the pypi
command. Read below for more information on how to use it:
$ pypicontents stats --help
usage: pypicontents stats [options]
General Options:
-V, --version Print version and exit.
-h, --help Show this help message and exit.
Stats Options:
-i <path>, --inputdir <path>
A path pointing to a directory containing JSON files
generated by the pypi command (required).
-o <path>, --outputfile <path>
A path pointing to a file that will be used to store
the statistics (required).
pypicontents errors
This command summarizes errors found in the logs generated by the pypi
command. Read below for more information on how to use it:
$ pypicontents errors --help
usage: pypicontents errors [options]
General Options:
-V, --version Print version and exit.
-h, --help Show this help message and exit.
Errors Options:
-i <path>, --inputdir <path>
A path pointing to a directory containing JSON files
generated by the pypi command (required).
-o <path>, --outputfile <path>
A path pointing to a file that will be used to store
the errors (required).
pypicontents merge
This command searches for JSON files generated by the pypi
or stdlib
commands and combines them into one. Read below for more information on how to use it:
$ pypicontents merge --help
usage: pypicontents merge [options]
General Options:
-V, --version Print version and exit.
-h, --help Show this help message and exit.
Merge Options:
-i <path>, --inputdir <path>
A path pointing to a directory containing JSON files
generated by pypi or stdlib commands (required).
-o <path>, --outputfile <path>
A path pointing to a file that will be used to store
the merged JSON files (required).
About the Module Index
In the pypi.json file (located in the contents
branch) you will find a dictionary with all the packages registered at the main PyPI instance, each one with the following information:
{
"pkg_a": {
"version": [
"X.Y.Z"
],
"modules": [
"module_1",
"module_2",
"..."
],
"cmdline": [
"path_1",
"path_2",
"..."
]
},
"pkg_b": {
"...": "..."
},
"...": {},
"...": {}
}
This index is generated using Travis. This is done by executing the setup.py
file of each package through a monkeypatch that allows us to read the parameters that were passed to setup()
. Check out pypicontents/api/process.py
for more info.
Use cases
-
Search which package (or packages) contain a python module. Useful to determine a project's
requirements.txt
orinstall_requires
.import json
import urllib2
from pprint import pprintpypic = 'https://raw.githubusercontent.com/CollageLabs/pypicontents/contents/pypi.json'
f = urllib2.urlopen(pypic)
pypicontents = json.loads(f.read())def find_package(contents, module):
for pkg, data in contents.items():
for mod in data['modules']:
if mod == module:
yield {pkg: data['modules']}Which package(s) content the 'django' module?
Output:
pprint(list(find_package(pypicontents, 'django')))