Yes, this is a Perl module. But it is also an R package. This is
because it is a bi-directional interface, it allows R to call Perl and
that very Perl code to call back to R. It allows us to pass R
functions to Perl and use them as callable objects. (Passing Perl
subroutines or methods to R is a little less elegant, but doable.)
Because the code is both a Perl module and an R package, we don't have
a simple choice in which approach to use for installing it. Because of
my background, we use R's approach for package installation. With
hindsight, MakeMaker or other tools like that might have been more
flexible, but we would still have to work within R's package mechanism.
(There are moves afoot to make R's package system a lot more flexible,
good as it already is.)
Anyway, now that we have seen why we use R to install the code, here's
how.
While we can install directly from the .tar.gz file, it may be best to
extract the files from the archive:
tar zxf RSPerl_0.9-0.tar.gz
or whatever the name of the source distribution you downloaded actually is.
If you want to call R from within Perl (the topic of this
document and why you are here), When installing the package, do so by
providing the argument
--with-in-perl
to the
configure script. This is most easily done as
R CMD INSTALL --configure-args='--with-in-perl' RSPerl
We used to need the --clean flag, but this is no longer necessary.
Now, --clean does actually clean up all the files that the
configuration and installation created. (Well almost all and any
others will be overwritten if you reinstall!)
Note also that to use the R-in-Perl mechanism one must have built R as
a shared library. (This is not necessary when calling Perl from R.)
You can check if this has been done by checking to see if libR is in
the directory
$R_HOME/lib/
. If this is not
there, you are advised to clean the entire R distribution (with
make distclean
) so as to start from scratch and then
configure and compile R using the
--enable-R-shlib
to
R's configuration script. The following sequence of commands should
work.
cd $R_HOME
make distclean
./configure --enable-R-shlib
make
If you don't have the source distribution at hand and are using a
previously installed binary, go fetch the source from
CRAN.
It is usually quite easy to install as a regular user and you can use
the version directly from where you built it.
Run-time configuration variables
Because there are two systems involved in this interface and we can
run R from within a Perl script or Perl within an R session/script,
there are a lot of different combinations to consider. If we run R
inside Perl, we need to find both the R run-time library (libR.so) and
also the RSPerl package which will get loaded when the R session is
started. We also need to find some additional shared libraries/DLLs in
the RSPerl package. For this, we need to make certain the dynamic
loader can find all these DLLs.
Perl also needs to find the Perl code, i.e. the R.pm, RReferences.pm
and the R.so files. We need to set
$PERL5LIB
to specify their location.
Additionally, we need to know where the R package is located
if it is not installed into
$R_HOME/library/
.
This is done via
$R_LIBS
And if we are running R from within Perl we also need to tell
the R engine where
$R_HOME is.
As Michael Dondrup said, that's a lot of environment variables
to set.
Typically we don't have to set them all. If we install the
R package into a personal library, that library is typically
where we put lots of R packages and so it is in our
$R_LIBS
variable already. Similarly, if we install the Perl code into
a local Perl library, we will have that specified in our
$PERL5LIB
environment variable.
And if we are running Perl inside R,
$R_HOME is already set when we
start R and also finding
libR.so
is done for us.
So the main variables we might have to set are
$LD_LIBRARY_PATH and
$PERL5LIB if the
Perl code is installed into the R package area,
We provide two shell scripts to set these variables to the appropriate
values. There is one for sh/bash-style shells and another for
csh/tcsh-style shells named RSPerl.bsh and RSPerl.csh respectively.
They are located in the
RSPerl/scripts /
directory of the
installed
package. These are not executable but rather intended to be sourced
into an existing shell to set the variables for the remainder of that
shell session. Use
. RSPerl/scripts/RSPerl.bsh
or
source RSPerl/scripts/RSPerl.csh
You can even add the relevant command to your
.bashrc
or
.cshrc
file so that they will be set when you create a
new shell.
In version 0.9-0, if you do not specify additional configuration
options controlling where the Perl code is installed (see
the section called “
The Installation and controlling where the Perl modules are installed.
”), the
$PERL5LIB should be correct. In the past, it has
assumed that the MakeMaker code would put the files into
site_perl/
rather than the perl version directory. This is
unfortunately dependant on other configuration variables which I
haven't had time to determine. That is why it worked for me on all
the machines I have access to, but not for some other people. So the
scripts are not broken, but just not dynamic enough. In version
0.9-0, they should be correct regardless of MakeMaker's defaults.
Note also that if you are running R from within Perl, you do not need
to set
$R_HOME or
$R_LIBS. The values
are conditionally set within the Perl code for the R module when you
start the R session. You will probably still need to specify the
$LD_LIBRARY_PATH and
$PERL5LIB if you
have installed the perl code outside of the standard Perl location.
But again, these scripts should contain the correct settings.
The Installation and controlling where the Perl modules are installed.
The basic idea is this. The configuration script ends up calling
perl Makefile.PL
which creates (the non-standard) Makefile.perl.
This call to
perl Makefile.PL
can be given numerous
arguments, but we care about PREFIX and LIB. These are explained in
the Perl module ExtUtil::MakeMaker's own manual.
If you, the installer, do not specify anything about this detail, the
files will be installed under the
perl/
directory of the
installed R package, wherever that is (controlled by the -l flag for R
CMD INSTALL or R_LIBS environment variable or simply into the R
library/
directory). It is sensible to do this as the
modules are tied to the package, and also we cannot simply install
them as a regular Perl package as a common user. Instead we need write
permission to the Perl site files and we do not want these privileges
for our entire R installation script.
If you want to control the location of the library, then you can use
the --with-perl-lib argument to the configure script. You specify the
name of a directory and that is passed to the call
perl
Makefile.PL
as LIB=<your value>.
If you want to control the PREFIX argument to
perl
Makefile.PL
, then you can use the --with-perl-prefix.
If the user specifies the prefix and not the library, we assume she knows
what she is doing and so we don't pass the LIB= argument.
If the user specifies the library, we only pass that to the
perl Makefile.PL
call as the value of LIB=.
These two configration arguments give you access to setting the
additional inputs for
perl Makefile.PL
.
If both are missing, then the MakeMaker code will set up
the installation to go into the standard Perl locations.
To get this behaviour, you use the
R CMD INSTALL --configure-args='--with-perl-lib=' RSPerl
Note that there is no value for the --with-perl-lib argument -
it is the empty string.
This is special and says don't pass anything to
perl Makefile.PL
.
The
R CMD INSTALL
script will end up calling
perl -f Makefile.perl install
and if you are using the standard Perl location,
this will fail unless you have permission to write there.
If you don't, the
R CMD INSTALL
will continue
and you will have to return and do the
installation of these files manually.
To do this, from within the source distribution in which the code was
compiled (not installed), run the following commands
cd RSPerl/src
make -f Makefile.perl install
A summary of the different inputs for the configuration scripts and
where the Perl code is installed and what to set the
$PERL5LIB
to is given in the following table.
So testing this on my old Red Hat machine with a guest login (i.e. no
settings from my account), I need only set
$PERL5LIB
(and
$LD_LIBRARY_PATH to
/usr/local/lib/
to
pick up libgcc_s.so.1 on which libR.so depends because of using
GCC_4.0). If you use --with-perl-lib during the R installation (and
then manuall make -f Makefile.perl install), you need not set any
variables. This is special to my machine it appears as all the shared
libraries are found dynamically at run time because of the compiler
switches. On another (but more modern) Red Hat box, all I had to set
was
$LD_LIBRARY_PATH to the
RSPerl/libs/
to
find libPerlConverters.so and
/usr/local/lib/R/lib/
to find
libR.so
.
By putting the module files in the usual Perl location, we have
avoided the need for setting
$PERL5LIB. We have also
added
$R_HOME and the location of the RSPerl package
which would ordinarily be specified by
$R_LIBS into
the code in R.pm.
Obviously, we cannot put the location of the Perl module into the Perl
code and use that mechanism as we wouldn't know where to find the
module code to run it! So if we install it in a non-standard place,
we assume that there is a good reason and that this is site-, group-
or user-specific.
The following section provides a very brief description of the
available routines. One should look at the example scripts in the
tests/
directory of the installed package to see how to call R
from Perl.
Before doing anything within Perl to call R functions, etc.
one needs to import the `R' module into your Perl script via the command
This does not start the R session, it just makes the code
in the module available to you script.
Additionally, one should follow this with a command
to load the `RReferences' module which is used
to export R objects to Perl as ``references''.
This import is done as
At this point, the R functionality is available to the script. One
need only initialize the R interpreter and then can make calls to
arbitrary R functions, etc.
The two test scripts
test.pl
and
test1.pl
in
the
tests/
of the installed package provide
some simple examples of how to use the R-from-Perl
invocation mechanism.
Basically, there are a few methods/routines that
provide access to R from Perl.
These are
- initR
- call
- callWithNames
- eval
initR() is now a regular Perl subroutine (having been
changed from a native routine) that ends up calling a C routine to
start the R session.
startR() is now a simple call to
initR()
so either will work.
Both these sub-routines take an arbitrary number of strings which are
used as the command line arguments one would pass to R if invoking it
from the shell command line.
For example
&R::startR("--silent")
&R::startR("--gui=none", "--vanilla")
&R::initR("--silent")
&R::initR("--gui=none", "--vanilla")
|
---|
These arguments are available to R expressions via the function
commandArgs()
.
These subroutines take care of conditionally setting the value of the
environment variable
$R_HOME as determined at
configuration/installation time. If
$R_HOME is already
set, it will leave that value. This allows one to use different R
installations with the same code. But be careful when doing this that
they are binary compatible!
initR() also loads the RSPerl package into R. This
is generally a good thing to do. However, if there are reasons to
avoid this, use the
initRSession()
subroutine
which is the native routine that starts the session.
Note that this will not set
$R_HOME.
To do this, you can call
setRHome().
And you can also use
getDefaultRHome() to find the value
of
$R_HOME determined at install-time.
Having started the R session, we can not make function calls.
In both Perl and R, we would call a function directly by its name
as it is a regular variable. Of course, R variables are not directly
accessible in the scope of Perl commands and that is a good thing!
But we would still like to be able to write something like
We use Perl's AUTOLOAD feature to dynamically map names that have no corresonding
subroutine into.
In this way, all the available R variables are available using this
syntax
R::functionName(arg1, arg2, ...)
The AUTOLOAD facility is a convenient wrapper to the
call() subroutine in the R module. This is what
actually invokes arbitrary R functions from within Perl. This takes
the name of the R function and a collection of arguments and invokes
the corresponding R function directly and returns its value.
call() is one of the three fundamental method of the
interface that allows the caller in Perl to invoke an S function,
passing values from Perl to R and having them converted as they are
transferred across the interface. One has to be aware of the way that
Perl places values on its stack. If one has a Perl array and passes
this directly to the call, all the values in the array will be treated
as individual arguments to the S function. To pass the array as a
single value, pass its reference, using the \ escape mechanism.
R::sum((1,2,3))
@x = 1..10;
R::call("plot", \@x);
|
---|
When creating arrays to pass to R, we can also use the
[] operator to create the array and thus create a scalar
reference to the underlying array.
So we could have written the second example above as
$x = [1..10];
R::plot($x);
|
---|
R has a very flexible function call mechanism with partial name
matching, default values and optional arguments and lazy evaluation.
The named arguments are different from in Perl and so we need a way
to emulate this from within Perl.
callWithNames() allows one to call an S
function with named arguments. The first argument is the name of the
function to be called, specified as a string just as with
call(). The second argument is
a hash-table or associative array of name-value pairs. In the case of
arguments which no names are required in S, one can use an empty
string (''). In these cases, the arguments are matched by position.
See R's argument matching mechanism.
See
tests/test3.pl
.
# Call plot(x, ylab="My values")
@x = 1..10;
&R::callWithNames("plot", {'',\@x, 'ylab', 'My values'});
# Call plot(x = x, y = y, ylab="My values")
@y = &R::call("rnorm", 10);
&R::callWithNames("plot", {'x',\@x, 'y', \@y, 'ylab', 'My values'});
# Call par(mfrow=c(1,2))
@x = (1,2);
&R::callWithNames("par", {'mfrow', \@x} );
# R command: plot(x, y, xlab = "Horizontal", ylab = "My values")
$x = [ 1..10 ];
&R::callWithNames("plot", {'' => $x, '' => \@y, xlab => "Horizontal", ylab => 'My values'});
|
---|
In some circumstances, it is useful to be able to evaluate S
expressions given as strings in the S language. The method eval()
allows one to do this. Please note that string-based communication
between two system is not a good general design and can cause many
problems in maintaining code, avoiding name conflicts, and greatly
complicates mutable objects that span the two systems. Additionally,
it exposes the syntax of the target system (S) to the user of the host
system (Perl) which is what this inter-system interface is trying to avoid.
Neverthless, it is occassionally useful and convenient.
The
eval() function in the R module allows us to
evaluate an R command from within Perl.
&R::eval("sum(1:10)");
&R::eval("plot(1:10); T");
&R::eval("par(mfrow=c(2,1)); x <- rnorm(10); hist(x) ; plot(x) ; rm(x); TRUE");
|
---|