Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I install a lot of the same packages in different virtualenv environments. Is there a way that I can download a package once and then have pip install from a local cache?

This would reduce download bandwidth and time.

share|improve this question
1  
Note that as of pip 6.0 (2014-12-22), pip will cache by default. See pip.pypa.io/en/stable/reference/pip_install.html#caching for details. – Piët Delport Feb 24 '15 at 8:04
up vote 109 down vote accepted

Updated Answer 19-Nov-15

According to the Pip documentation:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Therefore, the updated answer is to just use pip with its defaults if you want a download cache.

Original Answer

From the pip news, version 0.1.4:

Added support for an environmental variable $PIP_DOWNLOAD_CACHE which will cache package downloads, so future installations won’t require large downloads. Network access is still required, but just some downloads will be avoided when using this.

To take advantage of this, I've added the following to my ~/.bash_profile:

export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache

or, if you are on a Mac:

export PIP_DOWNLOAD_CACHE=$HOME/Library/Caches/pip-downloads

Notes

  1. If a newer version of a package is detected, it will be downloaded and added to the PIP_DOWNLOAD_CACHE directory. For instance, I now have quite a few Django packages.
  2. This doesn't remove the need for network access, as stated in the pip news, so it's not the answer for creating new virtualenvs on the airplane, but it's still great.
share|improve this answer
4  
Maybe better idea is to put it into .bashrc, because bash_profile is executed only during login. That's up to you, and anyway it's a good advice :) – ns-keip May 24 '12 at 9:31
1  
On macs it is loaded at the beginning of any shell. – saul.shanabrook Jul 13 '12 at 13:16
3  
PIP_DOWNLOAD_CACHE is seriously flawed and I wouldn't recommend using it for things like getting packages out to your deployment machines. It also still relies on pypi.python.org being reachable. Great for a local development cache, but not suitable for heavier uses. – slacy Sep 25 '12 at 18:35
1  
@slacy Could you comment on why it is seriously flawed? If you don't want PyPI to be reachable, that's what --no-index is for; a download cache is surely orthogonal to reaching PyPI or not! – lvh Dec 1 '13 at 11:51
    
@lvh slacy's answer below explains why Pip's download cache is flawed. I've also seen pip install taking longer with cache enabled, bizarrely. pip-accel and basket appear to be better options. – qris Dec 23 '14 at 11:56

In my opinion, pip2pi is a much more elegant and reliable solution for this problem.

From the docs:

pip2pi builds a PyPI-compatible package repository from pip requirements

pip2pi allows you to create your own PyPI index by using two simple commands:

  1. To mirror a package and all of its requirements, use pip2tgz:

    $ cd /tmp/; mkdir package/
    $ pip2tgz packages/ httpie==0.2
    ...
    $ ls packages/
    Pygments-1.5.tar.gz
    httpie-0.2.0.tar.gz
    requests-0.14.0.tar.gz
    
  2. To build a package index from the previous directory:

    $ ls packages/
    bar-0.8.tar.gz
    baz-0.3.tar.gz
    foo-1.2.tar.gz
    $ dir2pi packages/
    $ find packages/
    /httpie-0.2.0.tar.gz
    /Pygments-1.5.tar.gz
    /requests-0.14.0.tar.gz
    /simple
    /simple/httpie
    /simple/httpie/httpie-0.2.0.tar.gz
    /simple/Pygments
    /simple/Pygments/Pygments-1.5.tar.gz
    /simple/requests
    /simple/requests/requests-0.14.0.tar.gz
    
  3. To install from the index you built in step 2., you can simply use:

    pip install --index-url=file:///tmp/packages/simple/ httpie==0.2
    

You can even mirror your own index to a remote host with pip2pi.

share|improve this answer
    
+1 pip2pip works great!! I don't like relying on network connectivity that much. It fails when you most need it. – Manuel Gutierrez Jun 13 '13 at 16:33
    
this works great, it answers my question stackoverflow.com/questions/18052217/… , can yon answer there as well ? – Larry Cai Aug 6 '13 at 0:45
    
@larrycai Done. – K Z Aug 6 '13 at 3:13
    
Maybe it was implied, but it's worth mentioning explicitly: pip2tgz detects if you have already downloaded the package to the designated directory, so if you run the same install line or several install lines that have overlapping dependencies, it will only download each package once. – clacke Mar 21 '14 at 23:33

PIP_DOWNLOAD_CACHE has some serious problems. Most importantly, it encodes the hostname of the download into the cache, so using mirrors becomes impossible.

The better way to manage a cache of pip downloads is to separate the "download the package" step from the "install the package" step. The downloaded files are commonly referred to as "sdist files" (source distributions) and I'm going to store them in a directory $SDIST_CACHE.

The two steps end up being:

pip install --no-install --use-mirrors -I --download=$SDIST_CACHE <package name>

Which will download the package and place it in the directory pointed to by $SDIST_CACHE. It will not install the package. And then you run:

pip install --find-links=file://$SDIST_CACHE --no-index --index-url=file:///dev/null <package name> 

To install the package into your virtual environment. Ideally, $SDIST_CACHE would be committed under your source control. When deploying to production, you would run only the second pip command to install the packages without downloading them.

share|improve this answer
1  
Why is this -I necessary? – Gabriel Jordão Sep 25 '12 at 14:37
    
Gabriel -- It's not downloaded twice, just once in the first step and then installed from local cache in the second. What are you seeing? – slacy Sep 25 '12 at 18:34
    
If I run the first step twice, it'll download it twice, right? At least it happened here. I'll need to know that the first step has been executed for this package at least once before executing it, otherwise it'll download the same file twice. How can I check either if I need to execute it or it has been downloaded before? – Gabriel Jordão Sep 25 '12 at 22:49
    
You probably just want to use pip2pi as the other answer suggests. :) – slacy Sep 26 '12 at 16:49
    
does this download the dependencies as well? – monkut Jul 11 '13 at 3:11

For newer Pip versions:

Newer Pip versions now cache downloads by default. See this documentation:

https://pip.pypa.io/en/stable/reference/pip_install/#caching

For older Pip versions:

Create a configuration file named ~/.pip/pip.conf, and add the following contents:

[global]
download_cache = ~/.cache/pip

On OS X, a better path to choose would be ~/Library/Caches/pip since it follows the convention other OS X programs use.

share|improve this answer
    
Why wouldn't it? – Michal Chruszcz Oct 29 '13 at 14:37
    
And If I wanted to store them globally for other users of the same PC to access? How would I do that? I figure the config file would have to be placed in /etc or something. – Batandwa Jan 7 '14 at 19:42
    
@batandwa: That might work. If not, you could try this: make sure that all the users have a pip.conf with a download_cache setting that points to the same system-wide directory. – Flimm Jan 7 '14 at 20:29

Starting in version 6.0, pip now does it's own caching:

  • DEPRECATION pip install --download-cache and pip wheel --download-cache command line flags have been deprecated and the functionality removed. Since pip now automatically configures and uses it’s internal HTTP cache which supplants the --download-cache the existing options have been made non functional but will still be accepted until their removal in pip v8.0. For more information please see https://pip.pypa.io/en/latest/reference/pip_install.html#caching

More information from the above link:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

share|improve this answer

pip wheel is an excellent option that does what you want with the extra feature of pre-compiling the packages. From the official docs:

Build wheels for a requirement (and all its dependencies):

$ pip wheel --wheel-dir=/tmp/wheelhouse SomePackage

Now your /tmp/wheelhouse directory has all your dependencies precompiled, so you can copy the folder to another server and install everything with this command:

$ pip install --no-index --find-links=/tmp/wheelhouse SomePackage

Note that not all the the packages will be completely portable across machines. Some packages will be built specifically for the Python version, OS distribution and/or hardware architecture that you're using. That will be specified in the file name, like -cp27-none-linux_x86_64 for CPython 2.7 on a 64-bit Linux, etc.

share|improve this answer

Using pip only (my version is 1.2.1), you can also build up a local repository like this:

if ! pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>; then
    pip install --download-directory="$PIP_SDIST_INDEX" <package>
    pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>
fi

In the first call of pip, the packages from the requirements file are looked up in the local repository (only), and then installed from there. If that fails, pip retrieves the packages from its usual location (e.g. PyPI) and downloads it to the PIP_SDIST_INDEX (but does not install anything!). The first call is "repeated" to properly install the package from the local index.

(--download-cache creates a local file name which is the complete (escaped) URL, and pip cannot use this as an index with --find-links. --download-cache will use the cached file, if found. We could add this option to the second call of pip, but since the index already functions as a kind of cache, it does not necessarily bring a lot. It would help if your index is emptied, for instance.)

share|improve this answer

A simpler option is basket.

Given a package name, it will download it and all dependencies to a central location; without any of the drawbacks of pip cache. This is great for offline use.

You can then use this directory as a source for pip:

pip install --no-index -f file:///path/to/basket package

Or easy_install:

easy_install -f ~/path/to/basket -H None package

You can also use it to update the basket whenever you are online.

share|improve this answer

There is a new solution to this called pip-accel, a drop-in replacement for pip with caching built in.

The pip-accel program is a wrapper for pip, the Python package manager. It accelerates the usage of pip to initialize Python virtual environments given one or more requirements files. It does so by combining the following two approaches:

  • Source distribution downloads are cached and used to generate a local index of source distribution archives.

  • Binary distributions are used to speed up the process of installing dependencies with binary components (like M2Crypto and LXML). Instead of recompiling these dependencies again for every virtual environment we compile them once and cache the result as a binary *.tar.gz distribution.

Paylogic uses pip-accel to quickly and reliably initialize virtual environments on its farm of continuous integration slaves which are constantly running unit tests (this was one of the original use cases for which pip-accel was developed). We also use it on our build servers.

We've seen around 10x speedup from switching from pip to pip-accel.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.