Development Environment Configuration for Scientists
If you are hoping to improve your research by improving your computational literacy, it’s absolutely imperative that your computer is set up correctly for the job. In this guide, I am going to run through how to set up your computing environment for programming and data science. In much the same way that you install your reference managers and cloud solutions to make your computer the best writing devices; here, we will be creating your programming toolkit with the essentials for programming.
Before we jump into the nitty gritty, it’s important to be aware the limitations of this guide (and any one like it).
- Development environment configuration is a mammoth topic to cover
- You’ll soon learn that the very nature of a development environment configuration is opinionated from the exact specifics I lay out here, depending on what you are after.
- On that note, The environment I present below may not be a perfect fit for everyone, and is certainly far from perfectly optimized - however, it will do the job of getting you up and going as quickly as possible.
The concentration of this guide will be on Python and R development (though I do cover more than these).
One of my main metrics that I built this environment against was slickness. Using the terminal and programming is much easier when your computing environment feels slick. Moreover, I believe that poor display environments actually inhibit productivity. I’m not entirely sure why this is the case - maybe a line for future research - but I it seems an intuitive fact to me.
Another important caveat to note is much of guide is written from the perspective of a Linux user. While this guide is certainly going to be useful to Windows users, you’re life is going to be made a lot easier if you are using a Mac or Linux based environment.
This guide will cover the basics of setting up your home directory, choosing an editor, and how to make your environment feel lush!
Choosing an Operating System
I know, I know, if you are reading this guide, you’ve probably already been set in your ways on which kind of operating system to use. I’m not going to continue beating this debate to death - but I think it’s important to run through some basics.
In the world of operating systems, the common scientific user is likely to choose between one of three families of operating system: MacOS, Windows, and Linux. I use all of these on a regular basis - but Linux will always be my favourite. The primary advantages of using Linux or MacOS is that you get great support for the command line tools, in many cases, out of the box. Windows still has a lot of growing to do before it can compete with these Unix based operating systems. The primary advantage of Linux distributions, such as Ubuntu, is that you get a package manager - a tool which automatically installs and configures software for you.
If you want to get a really comfortable data science environment, it is important that you are first comfortable with your operating system. While beyond the scope of this post, I think it is well worth spending the time to work out whether a different operating system might work better for you, and whether you think you might need to spend some more time learning the one you are currently using. On a final point, I firmly believe that in order to be an effective developer and data-scientist, you really need to learn how to use a terminal environment, and there is no better operating system for this than Linux.
Choosing and Installing Editors
You are going to need to install two code editors. Writing code without an editor is like trying to write your paper in wordpad. Some people suggest using IDEs (integrated development environments) - but I do not. If you are just starting out with programming, you will not use the features of the IDE, and more than likely, you will be confused by the interfaces, options, menus and settings. Instead, you want a good text editor. I’m not going to suggest vim or Emacs but i do recommend you choose one of them at some stage.
For now you want to install the following two editors:
Fonts and ＡＥＳＴＨＥＴＩＣ
* Patched Powerline Fonts * Go Mono for Powerline
Getting your terminal environment to be comfy is the first best step you can do to improve your computing efficiency. Let’s face it, if you are reading this article, you’re probably not a command-line guru already and you probably are looking for an environment that’s more than bash command line. In what follows, I will get you started with the following:
- A good terminal emulator
- A visually pleasing appearance
- Auto-completion for commands
- Intelligent suggestions
- Multi-terminal support
- A comfortable command-line editor
In the image below, you can see my terminal environment in all its glory.
The first thing you need is a good terminal emulator. A terminal emulator is the program you use to actually run terminal. On a Mac computer, that usually means iTerm2, rather than your default “terminal” application. On Linux, it generally means Terminator rather than Gnome Terminal.
Personally, as a Linux user, I use two terminal emulators: Guake and Terminator. Guake is a Quake style dropdown
terminal. This means you can bind a key combination (in my case
ctrl - alt - ~) to display a terminal from anywhere in
your working environment. In my workflow, I tend to use the instant Guake dropdown terminal for quick and easy
commands - while I use Terminator as a more “always there” Terminal emulator.
For Windows users, I’d suggest downloading Cmder, which comes preinstalled with a Cygwin-like setup out of the box (i.e. useful Unix-like commands) and the engine running it is ConEmu.
That’s pretty much all these is to it, download one of these and you’ll be fine.
Choosing A Shell
When you are typing in commands to the terminal, the typical default is bash. However, this is not your only option. Zsh is the most popular alternative. It has some substantial benefits to bash, though beyond the scope of this article to cover.
Oh-my-zsh is a Framework devoted to creating an excellent out of the box terminal environment. After a fresh install, you will get a nice looking terminal, a bunch of autocomplete configurations, and a range of plugins to boost your productivity. There’s not much more to say - if you don’t have a strong preference - just install OMZsh and you’ll be ready to go.
Beauty is in the eye of the beholder. That said, I have a minimalist perspective. If you have installed OMzsh, you can set your theme to one of many that come pre-installed.
There are thousands of configuration guides out there for
- tmux - I use Ted Sluis’s tmux configuration which provides an excellent out of the box experience. I’ve enabled mouse support.
There are thousands of resources out there that go well beyond the scope of my contributions here. Below I’ve tried to just place some of the essentials I’ve found along the way as well as resources I’ve come across in writing this post:
- A post similar to this targeted at Mac users in research http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/
- A guide for data science focused on Anaconda https://kfolds.com/setting-up-a-data-science-environment-5e6fd1cbd572