Why use conda (or similar tools)?¶

Note

What is conda?

Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more.

You can use the versions of the tools you want/need¶

On the computing cluster, a limited set of versions of programming language interpreters (python, R, etc.) can be easily installed and maintained by the core facility. For example, the cluster currently supports Python v2.7 (end of life), Python 3.6, and R 3.5 (the latter two of which are woefully out of date).

Likewise, managing large collections of packages that work with these interpreter versions is difficult and stops working once the interpreter gets too outdated. Updating the interpreter is not trivial and also requires reinstalling all the packages.

With conda environments, you set up smaller environments to suit the needs of your projects and bypass the global, outdated environments we have on the cluster.

You can create multiple frozen, project-specific environments¶

Scenario 1: you are writing a paper and working on the methods section. You want to record what versions of programs you used in your analyses, but how do you know if the versions available to you now match what were installed when you did your work?

Scenario 2: you wrote a program or script awhile back and now it will not run with the current version of Python, R, etc. or the current versions of packages.

Instead, you could use conda to create a project-specific environment that contains a fixed set of programs and packages that remain unchanged as you do your work. The versions of the programs you used can be easily queried and reported for reproducibility.

Citations¶

Sandve GK, Nekrutenko A, Taylor J, Hovig E. 2013. Ten simple rules for reproducible computational research. PLoS Computational Biology 9:e1003285. DOI: 10.1371/journal.pcbi.1003285.