The Rllvm Package

The Rllvm package is an R-interface to the llvm library that provides facilities for creating native code and compilers for different languages. The aim of this package is to provide an interface to the llvm facilities so that we, as a community, can experiment with building compilers for the R language and aim to speed up its evaluation.

I have long felt that we should build on other platforms that provide their own compilers, e.g. LISP, but this is an effort to stay within the R community but provide the foundation to build on native compilers rather than intepreted byte compilers.

This package is not yet a compiler for R. It merely provides the tools on which one can write a compiler to create native code. I expect that we can utilize Luke Tierney's compiler package on top of this to leverage some optimizations there and then generate the native code and then use LLVMs optimization passes. It remains to be seen whether these two optimization approaches are orthogonal oor share a great deal in common.

Documentation

There are several examples, adapted from the LLVM tutorials and developed as explorations of that API.

Compiling GPU kernels compiling simple scalar arithmetic This example comes from the LLVM tutorial (actually the documentation for the previous release). This takes 3 numbers and squares the 3rd and adds the result to the sum of others. This is not a vectorized function. The code in R closes parallels the C++ code described in that tutorial. Computing the greatest common divisor (GCD) Computing the GCD of two integer. This example comes from the LLVM tutorial (actually the documentation for the previous release). The code in R closes parallels the C++ code described in that tutorial. cumulative sum of a vector This implements the cumulative sum of a vector in R and via a manually generated native routine. The speedup is a factor of 32.7 on a Mac OS X machine and 26 on a Linux machine. Implementing x + 1 in R and natively This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on. 2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ
compiling simple scalar arithmetic This example comes from the LLVM tutorial (actually the documentation for the previous release). This takes 3 numbers and squares the 3rd and adds the result to the sum of others. This is not a vectorized function. The code in R closes parallels the C++ code described in that tutorial. Computing the greatest common divisor (GCD) Computing the GCD of two integer. This example comes from the LLVM tutorial (actually the documentation for the previous release). The code in R closes parallels the C++ code described in that tutorial. cumulative sum of a vector This implements the cumulative sum of a vector in R and via a manually generated native routine. The speedup is a factor of 32.7 on a Mac OS X machine and 26 on a Linux machine. Implementing x + 1 in R and natively This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on. 2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: This example comes from the LLVM tutorial (actually the documentation for the previous release). This takes 3 numbers and squares the 3rd and adds the result to the sum of others. This is not a vectorized function. The code in R closes parallels the C++ code described in that tutorial.
Computing the greatest common divisor (GCD) Computing the GCD of two integer. This example comes from the LLVM tutorial (actually the documentation for the previous release). The code in R closes parallels the C++ code described in that tutorial. cumulative sum of a vector This implements the cumulative sum of a vector in R and via a manually generated native routine. The speedup is a factor of 32.7 on a Mac OS X machine and 26 on a Linux machine. Implementing x + 1 in R and natively This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on. 2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: Computing the GCD of two integer. This example comes from the LLVM tutorial (actually the documentation for the previous release). The code in R closes parallels the C++ code described in that tutorial.
cumulative sum of a vector This implements the cumulative sum of a vector in R and via a manually generated native routine. The speedup is a factor of 32.7 on a Mac OS X machine and 26 on a Linux machine. Implementing x + 1 in R and natively This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on. 2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: This implements the cumulative sum of a vector in R and via a manually generated native routine. The speedup is a factor of 32.7 on a Mac OS X machine and 26 on a Linux machine.
Implementing x + 1 in R and natively This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on. 2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: This is the example Luke Tierney illustrated timing results for in his UseR! 2010 talk illustrating two approaches to byte-code compilation, and Stephen Milborrow’s Ra/jit system. Here, we manually generated native code to implement the R function. The code we created performs 108 times faster than the interpreter R code on a Mac OS X, and 72 times faster on a Linux machine. This contrasts with the numbers Luke reported (on a different machine) on which there was a a speedup of a factor of 3.4 for the original byte compilation, 20 for Ra, and 29 for the experimental byte compilation system Luke is working on.
2-D random walk This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340. Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: This is an implementation and comparison of Ross Ihaka's example of a 2-D random walk. Ross progressively illustrates how to improve the naieve implementation using profiling and gradually vectorizing the code. The result is a speed up of a factor of 200. By implementing the naieve version via Rllvm, we obtain a speedup of a factor of 340.
Generating a routine that calls an existing, external routine This also shows how to control how the external symbols are resolved. Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: This also shows how to control how the external symbols are resolved.
Global variables Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ
Details of storing values in variables and elements of arrays store.R, store1.R store2.R Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: store.R, store1.R store2.R
Comparison of Timings for 5 different problems. The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the FAQ: The problems and approaches are described in examples in the package in explorations/ and tests/. These not only show that we can outperform R's interpreter, but also outperform R's vectorized code by changing the
FAQ

Other Approaches

I was unaware when I started this work that Byron Ellis had also started on bindings to llvm back in 2008. See rllvm on r-forge.

Byte-code compilation is another worthwhile approach. See Luke Tierney's articles on this and the existing support in the R engine and his compiler package.

Issues

There are many classes and methods to add to this interface.

License

This is distributed under the GPL2 License.

Duncan Temple Lang <duncan@wald.ucdavis.edu>

Last modified: Tue Jul 16 11:30:42 PDT 2013