My experiences with Rcpp

The last seven days till Tuesday I have been working on the conversion of the code of my master thesis from scripted R (statistics) to compiled C++ using the Rcpp package from Dirk Eddelbuettel. Despite the initial effort necessary to set up the system (especially under Windows), I was looking forward to a huge speed-up of my simulation.

Setting up Rcpp

Setting up Rcpp under Windows is more or less straight-forward – there are just many small things you should take care of and it took me some time to figure out all of them. A good starting point is the Rcpp-FAQ that gives information on which software you’ll need. Luckily Duncan Murdoch provides the RTools package which puts together nearly everything you need. Due to license or size limitations, there are some tools still missing which you will need to install additionally. They are described in Appendix D – The Windows toolset of the R Installation and Administration manual. Be careful not to have spaces in the paths of the components. I especially want to emphasize that the German version of Windows 7 shows the “Program Files” folder as “Programme” which looks as though it doesn’t have a space. If you click on the address bar of the explorer though, you will see that “Programme” is just a link to the “Program Files” folder which actually has a space and therefore installing R there will not work (or at least didn’t work for me).

Converting R-code to C++

Converting my code from R to C++ was easier than I first thought. Using the inline method from Rcpp you can directly include C++ code as a string in R, have it compiled into a function and call it from R. The compiler error messages will be forwarded to R and displayed there which helps a lot debugging your code. Just some of the error messages are apparently not forwarded, for example if you try to access an std::vector-element with an index out of range, R will simply crash without any warning. Converting my simulation, this was the only error I found which R was not able to communicate with me.

Using Visual Studio with Rcpp

Even though Dirk Eddelbuettel and Romain François answer the question “Can I use Rcpp with Visual Studio” straightforwardly with “Not a chance”, I was using Visual Studio quite extensively for my development. It is true, that you won’t be able to compile your code with Rcpp, that is what you still need the toolchain from RTools for. But that doesn’t keep you from using Visual Studio for development. My solution looks as follows: I use a file dppClustering.cpp which I load and compile from R with the include function. In this file all variables are converted from Rcpp-variables into C++ variables. With these I then call my simulation-class that does contains the logic.

To develop with VS, instead of using dppClustering.cpp I created a new project that includes the simulation classes and accesses their functionality. With this set-up I am able to use the complete power of Visual Studio for my development, but I can still compile from within R using Rcpp.

How about the speed-up?

The runtime difference between R and C++ code is just mind-blowing. I averaged the runtime of 40 C++ runs and 2 R runs and calculated a speed-up of over 100.

The combination of fast implementation with R and additional runtime improvements using C++ with Rcpp for the computationally intensive parts of the code makes Rcpp an enormously powerful tool – the week I invested really payed off.

Why you should worry about color

Ever created a graph ad hoc without putting a lot of thought into how to color your data? Then you should probably have a look into this article.

Why you should worry about color

Example of data representation by color

The authors explain how color is composed in general (there are three independent dimensions) and show with many examples which dimensions should be used to represent which data to appropriately display the information detail containted in the data.

You’ll definitely remember it when you create your next graph.

Thesis Latex template

If you are working as a researcher, you’ve probably wondered one time or another which Latex template would fit your publication best. When I was working on my bachelor thesis in 2010 I was looking around and stumbled across a template by André Miede, the classicthesis template. It provides authors with a “classic, high quality typographic style which is inspired by Robert Bringhurst’s The Elements of Typographic Style“:

Classic Thesis Template

The template is still maintained and gets an update around twice a year. It has a clear structure and fresh design which makes it very suitable for publications of any kind.

Clustering Epigenetic data

I just started my master thesis at the Max Planck Institute of Informatik (Saarbrücken, Germany). Since I’m looking forward to many hours of research where I will be digging deep into the basin of human knowledge, I started this blog. It’ll help me to keep hold of interesting ideas that will pop up – not necessarily directly connected to my topic from the field of Bioinformatics which could be described as:

Clustering epigenetic data by means of Dirichlet Process

Feel free to come by from time to time to see what I’m up to.