Using COM facilities in R
Abstract
The
RDCOMClient provides relatively easy to use facilities
for creating COM objects from R and using their methods to perform different
actions and accessing their properties.
This package is similar in spirit and in
functionality to that provided in Python and Perl, but this package
of course focuses on providing the facilities in R. It
offers a dynamic mechanism for method/property access for arbitrary
COM objects that support the
IDispatch
interface.
More efficient and structured access can be automatically generated using
functions in the
SWinTypeLibs.
The RDCOMClient package provides general tools for accessing
functionality in COM servers. The first thing we do is to get a
handle to a COM server object. There are two ways to do this. One is
to create a new instance of registered COM servers. By registered, we
mean a COM server that was explicitly published and put into the
Windows registry with a name and a UUID (universal identifier). The
other technique is to get a handle to an existing COM object that was
created earlier, not necessarily by R but perhaps manually by the
user. For example, I may have opened Excel and done some work in a
spreadsheet. I know want to bring the data from that spreadsheet into
R and have R add content to that sheet. In that case, I need to
ensure that R works with that particular instance of Excel and not a
new Excel object with an entirely separate, empty worksheet.
To create a new instance of a registered COM server class, we use the
function
. We give it either the
name of the COM class, e.g. "Excel.Application", as a string or
alternatively the UUID. The latter is a very long string that was
generated to uniquely identify the COM server with an epsilon chance
of any other UUID being generated that would be the same.
To create a new instance of the Excel application,
we would use the command
ex = COMCreate("Excel.Application")
Similarly, if we wanted to create an instance of
MS Word, we would use
word = COMCreate("Word.Application")
and
Microsoft's Internet Explorer
can be brought up using
ie = COMCreate("InternetExplorer.Application")
In each of these examples, the resulting R objects (
ex,
word and
ie) are references to a generic
DCOM object. R knows nothing about the specifics of these
applications. It cannot tell the characteristics of one from the
other, but it can access each of their methods and properties assuming
that the R "programmer"/user knows their names. We'll see how we can
discover the names of these methods and properties from within R also.
Now that we have a reference/handle to a COM object in which we are
interested, we can now move on and start accessing its data
(properties) and calling its methods (functions). There underlying
function that does this is
.COM()
. This handles
getting and setting properties and invoking methods, with or without
arguments. In fact, the different between methods and properties is a
little nebulous. Properties can often be considered methods. But that
is not important. And, to a large extend, the
.COM()
function is not important either. Rather, we typically use syntactic
sugar to make accessing these methods and properties more facile.
To access a method, we use the
$
operator on the COM object.
For example, to call the CheckSpelling method
in the Excel Application object, we
can use the R command
The general form is
comObj$methodName(arg1, arg2, arg3, ...)
|
---|
We are calling the function named "methodName"
in the server object
comObj
and passing it the R values as arguments.
This is equivalent to
.COM(comObj, "methodName", arg1, arg2, arg3, ...)
|
---|
but this is slightly more typing and doesn't
indicate that
comObj may change
as a result of the computation.
To access a DCOM property, we can again use the
.COM()
function or the
[[
operator with the name of the
property. For example, when we create a new Excel.Application
instance, it will not be displayed on the screen. Instead, it exists
and we can do computations within such as loading a file or inserting
values into it before the user sees it. When we want to display it,
we set the
Visible to a non-zero value.
We can query the current value of the
Visible
property using the R expressions
and we can set it from R using
the command
This will have the side-effect of making the
application appear on the screen.
The
$
for methods and
[[
for properties are
thin convenience layers on top of the
.COM()
function.
That is the workhorse of the
RDCOMClient package. And
we can discuss how this handles its arguments and passes them to the
DCOM method, and similarly how it returns values from these method
invocations.
DCOM, like CORBA, aims to provide a programming language-independent
mechanism for remote method invocation (RMI) or procedure calls (RPC).
Almost all programming languages provide basic data structures such as
integer and real valued numbers. They also provide characters and/or
strings, a collection of characters that can be considered a unit of
text. Some have logical or boolean values. All provide some way of
having collections of these primitive elements. DCOM therefore
defines data types for many of these commonly used data types. We map
between these DCOM data types and R's own data types when converting
arguments and return values.
DCOM type |
R type |
I4, I1, I2, INT
|
integer |
BOOL |
logical |
R8 |
numeric |
BSTR |
character |
UI1, UI2, UINT, UI4 |
numeric |
CY, DATE, DECIMAL, HRESULT |
numeric |
UNKNOWN |
COMUnknownObject |
DISPATCH |
COMIDispatch |
EMPTY, NULL, VOID |
NULL |
The types
RECORD,
FILETIE,
BLOBL,
STREAM,
STORAGE,
STREAMED_OBJECT,
LPWSR,
PTR,
ERROR,
VARIANT,
CARRAY,
USERDEFINED
are not yet handled.
So we have seen how to create COM instances, connect to existing COM
instances, and call methods and get and set property values in a COM
object. The important part of making use of a COM server is not the
underlying
.COM()
mechanism, but more understanding
what methods and properties are available to you. There are four
strategies for finding this out.
- Trial and error by using educated
guesses for the names of the properties and methods!
This is clearly a frustrating and inefficient approach.
-
Find and read books or Web pages
that describe the server and
its high-level computational model and
facilities.
For large applications like Excel, Word,
Access, and so on, there are many good
resources at this level.
The Web pages at Microsoft are
a good start and there are many tutorials
about specific topics with code snippets
for different tasks.
Most of these pages and books are written
for Visual Basic programmers.
While this makes the code less relevant and
slightly harder to read for us, it is
still very useful. All we are looking for
is the methods and properties and how they can be used.
And as with most good code, it is the description
that surrounds it rather than the code itself that is
most informative to humans.
-
For Microsoft applications such as Excel and Word and Office generally,
there is an interactive visual tool for browsing classes and their
methods and properties (and supported events). Using the Visual Basic
Editor and its Object Browser tool, one can examine the contents of
what is called a type library. This is a collection of information
about data types and classes and their methods and properties that can
be programmatically accessed. The Object Browser provides a GUI
front-end to this information and is relatively convenient way to
explore the entire class hierarchy,
with easy links between the classes referenced as parameters
and return values of methods and properties.
-
The type library is not specific to the Visual Basic Editor's Object Browser.
We can access it from any other application that can read and understand
its contents. The SWinTypeLibs
package provides facilities to read and explore arbitrary
type libraries within an R session.
One can load the type library associated with a particular
COM object or alternatively specify the file name of a
type library without needing a instance of the COM object.
One can find out about all the different classes
using R commands
such as names()
and
getFuncs()
and getElements()
.
An example may help.
It is not essential that you understand all the details here.
It is good to know that they exist and are accessible from R.
We'll load the library associated with our Excel
instance and look at all the classes, etc. it contains.
lib = LoadTypeLib(ex)
names(lib)
|
---|
[1] "Adjustments" "CalloutFormat" "ColorFormat" "LineFormat"
[5] "ShapeNode" "ShapeNodes" "PictureFormat" "ShadowFormat"
[9] "TextEffectFormat" "ThreeDFormat" "FillFormat" "DiagramNodes"
[13] "DiagramNodeChildren" "DiagramNode" "IRTDUpdateEvent" "IRtdServer"
[17] "Constants" "XlCreator" "XlChartGallery" "XlColorIndex"
[21] "XlEndStyleCap" "XlRowCol" "XlScaleType" "XlDataSeriesType"
...
...
[625] "SmartTags" "SmartTagRecognizer" "SmartTagRecognizers" "SmartTagOptions"
[629] "SpellingOptions" "Speech" "Protection" "PivotItemList"
[633] "Tab" "AllowEditRanges" "AllowEditRange" "UserAccessList"
[637] "UserAccess" "RTD" "Diagram" "IDummy"
[641] "ICanvasShapes" "QueryTable" "Application" "Chart"
[645] "Worksheet" "Global" "Workbook" "OLEObject"
So there are 648 entries in the library. Many of the, such as
XlColorIndex, are what are called enumerated constants which provide
names for specific values which are used as arguments to methods.
Others describe interfaces for event handlers which we will see later.
But elements like Application (second last row) represent the
top-level classes we want to work with. Each element has a class and
we are typically interested in the ITypeInfoDispatch objects which
correspond to the IDispatch COM server objects.
We can find all 241 of these with the command
names(lib)[sapply(lib, class) == "ITypeInfoDispatch"]
|
---|
[1] "Adjustments" "CalloutFormat" "ColorFormat" "LineFormat" "ShapeNode" "ShapeNodes"
[7] "PictureFormat" "ShadowFormat" "TextEffectFormat" "ThreeDFormat" "FillFormat" "DiagramNodes"
[13] "DiagramNodeChildren" "DiagramNode" "IRTDUpdateEvent" "IRtdServer" "_Application" "_Chart"
[19] "Sheets" "_Worksheet" "_Global" "_Workbook" "Workbooks" "PublishObject"
[25] "DefaultWebOptions" "WebOptions" "TreeviewControl" "CubeField" "CubeFields" "Font"
[31] "Window" "Windows" "AppEvents" "WorksheetFunction" "Range" "ChartEvents"
[37] "VPageBreak" "HPageBreak" "HPageBreaks" "VPageBreaks" "RecentFile" "RecentFiles"
[43] "DocEvents" "Style" "Styles" "Borders" "AddIn" "AddIns"
[49] "Toolbar" "Toolbars" "ToolbarButton" "ToolbarButtons" "Areas" "WorkbookEvents"
[55] "MenuBars" "MenuBar" "Menus" "Menu" "MenuItems" "MenuItem"
[61] "Charts" "DrawingObjects" "PivotCache" "PivotCaches" "PivotFormula" "PivotFormulas"
[67] "PivotTable" "PivotTables" "PivotField" "PivotFields" "CalculatedFields" "PivotItem"
[73] "PivotItems" "CalculatedItems" "Characters" "Dialogs" "Dialog" "SoundNote"
[79] "Button" "Buttons" "CheckBox" "CheckBoxes" "OptionButton" "OptionButtons"
[85] "EditBox" "EditBoxes" "ScrollBar" "ScrollBars" "ListBox" "ListBoxes"
[91] "GroupBox" "GroupBoxes" "DropDown" "DropDowns" "Spinner" "Spinners"
[97] "DialogFrame" "Label" "Labels" "Panes" "Pane" "Scenarios"
[103] "Scenario" "GroupObject" "GroupObjects" "Line" "Lines" "Rectangle"
[109] "Rectangles" "Oval" "Ovals" "Arc" "Arcs" "OLEObjectEvents"
[115] "_OLEObject" "OLEObjects" "TextBox" "TextBoxes" "Picture" "Pictures"
[121] "Drawing" "Drawings" "RoutingSlip" "Outline" "Module" "Modules"
[127] "DialogSheet" "DialogSheets" "Worksheets" "PageSetup" "Names" "Name"
[133] "ChartObject" "ChartObjects" "Mailer" "CustomViews" "CustomView" "FormatConditions"
[139] "FormatCondition" "Comments" "Comment" "RefreshEvents" "_QueryTable" "QueryTables"
[145] "Parameter" "Parameters" "ODBCError" "ODBCErrors" "Validation" "Hyperlinks"
[151] "Hyperlink" "AutoFilter" "Filters" "Filter" "AutoCorrect" "Border"
[157] "Interior" "ChartFillFormat" "ChartColorFormat" "Axis" "ChartTitle" "AxisTitle"
[163] "ChartGroup" "ChartGroups" "Axes" "Points" "Point" "Series"
[169] "SeriesCollection" "DataLabel" "DataLabels" "LegendEntry" "LegendEntries" "LegendKey"
[175] "Trendlines" "Trendline" "Corners" "SeriesLines" "HiLoLines" "Gridlines"
[181] "DropLines" "LeaderLines" "UpBars" "DownBars" "Floor" "Walls"
[187] "TickLabels" "PlotArea" "ChartArea" "Legend" "ErrorBars" "DataTable"
[193] "Phonetic" "Shape" "Shapes" "ShapeRange" "GroupShapes" "TextFrame"
[199] "ConnectorFormat" "FreeformBuilder" "ControlFormat" "OLEFormat" "LinkFormat" "PublishObjects"
[205] "OLEDBError" "OLEDBErrors" "Phonetics" "PivotLayout" "DisplayUnitLabel" "CellFormat"
[211] "UsedObjects" "CustomProperties" "CustomProperty" "CalculatedMembers" "CalculatedMember" "Watches"
[217] "Watch" "PivotCell" "Graphic" "AutoRecover" "ErrorCheckingOptions" "Errors"
[223] "Error" "SmartTagAction" "SmartTagActions" "SmartTag" "SmartTags" "SmartTagRecognizer"
[229] "SmartTagRecognizers" "SmartTagOptions" "SpellingOptions" "Speech" "Protection" "PivotItemList"
[235] "Tab" "AllowEditRanges" "AllowEditRange" "UserAccessList" "UserAccess" "RTD"
[241] "Diagram"
"_Application" is the actual class of the COM object
returned via COMCreate("Excel.Application")
.
We can examine its methods and properties using
funcs = getFuncs(lib[["_Application"]])
|
---|
Then we can look at these as we want.
Again, each element has a name and a class.
table(sapply(funcs, class))
|
---|
FunctionInvokeDescription PropertyGetDescription PropertySetDescription PropertySetRefDescription
79 196 96 2
And
tells us all the different methods and properties that are available
to us. If we want to see the Visible property for example, we can
look at it as funcs$Visible
. This says it is a
PropertyGetDescription and it tells us that its return type is a
"VARIANT_BOOL". This is just a logical value.
We know that we can set the value of this property, yet we are looking
at the PropertyGetDescription object. To find the
PropertySetDescription, we have to find the other element named
Visible in the list of methods and properties.
which(names(funcs) == "Visible")
|
---|
That second element tells us about setting the property.
To look at methods for the Worksheet class,
we can query the corresponding
IDispatch type
funcs = getFuncs(lib[["_Worksheet"]])
|
---|
Again, we use _Worksheet rather than Worksheet since the latter is an
aggregation of the former and some event interfaces. Let's look at
the Range method which is used to describe a collection of cells on
the worksheet.
Looking at the elements of this as it is printed
An object of class "PropertyGetDescription"
Slot "returnType":
An object of class "TypeDescriptionRef"
Slot "reftype":
[1] 50374848
Slot "name":
[1] "<User Defined>"
Slot "parameters":
$Cell1
An object of class "ParameterDescription"
Slot "name":
[1] "Cell1"
Slot "type":
An object of class "TypeDescription"
Slot "name":
[1] "VARIANT"
Slot "style":
An object of class "ParameterStyle"
Slot "In":
[1] TRUE
Slot "out":
[1] FALSE
Slot "lcid":
[1] FALSE
Slot "retval":
[1] FALSE
Slot "optional":
[1] FALSE
Slot "defaultValue":
NULL
$Cell2
An object of class "ParameterDescription"
Slot "name":
[1] "Cell2"
Slot "type":
An object of class "TypeDescription"
Slot "name":
[1] "VARIANT"
Slot "style":
An object of class "ParameterStyle"
Slot "In":
[1] TRUE
Slot "out":
[1] FALSE
Slot "lcid":
[1] FALSE
Slot "retval":
[1] FALSE
Slot "optional":
[1] TRUE
Slot "defaultValue":
NULL
Slot "invokeType":
propertyget
2
Slot "kind":
dispatch
4
Slot "name":
[1] "Range"
Slot "hidden":
[1] TRUE
Slot "memid":
[1] 197
The parameters slot tells us that this expects two parameters,
Cell1 and Cell2 and it says they can be generic values
which is identified by type VARIANT.
We know they can be names such as "A1", "C10"
and we'll see that they can be other Range objects to identify
individual cells also.
The return type is defined within the library itself.
We can resolve it
getRefTypeName(lib[["_Worksheet"]], funcs$Range@returnType@reftype)
|
---|
and we get
name
"Range"
We will develop and extend an example through the paper to illustrate
different aspects of DCOM with R. We start by focusing on using R as
a client. The example is very simple. We will use Excel to display
the results of a univariate bootstrap. We write a function in R to do
the bootstrap and display the results. The function takes as input
the original data, the statistic to be calculated for each bootstrap
sample, and the number of bootstrap repetitions (999) by default.
We could use the
boot package
to do the bootstrap, but we will write it ourselves here
as the code is very simple.
uniBootstrap =
function(data, statistic = median, B = 999)
{
n = length(data)
samples = matrix(sample(data, n * B, replace = TRUE), n, B)
results = apply(samples, 2, statistic)
list(samples = samples, results = results)
}
|
---|
We can call this from within our R session, but now we want to display
the samples along with the result for each sample and then a summary
of the
results object. So we can write a function that
takes the output from
uniBootstrap()
and puts each
column of the
samples matrix into a column of a new
Excel worksheet. It will put the result for that sample in a separate
row above the samples and above that provide a simple summary of the
entire collection of
results.
showBootstrapResults =
function(bootData, book)
{
n = nrow(bootData$samples)
B = ncol(bootData$samples)
# Create a new worksheet in the book.
sheet = book$Worksheets()$Add()
# Put the summary of the results vector in the first row.
sumy = summary(bootData$results)
r = sheet$Range("A1:F1")
r[["Value"]] = names(sumy)
r = sheet$Range("A2:F2")
r[["Value"]] = as.numeric(sumy)
# Put the results vector in the 4th row.
r = sheet$Range(sheet$Cells(4, 1), sheet$Cells(4, B))
r[["Value"]] = bootData$results
#
for(i in 1:B) {
r = sheet$Range(sheet$Cells(6, i), sheet$Cells(6 + n - 1, i))
r[["Value"]] = asCOMArray(bootData$samples[, i])
}
sheet
}
|
---|
This code illustrates two ways of accessing a range of cells. In the first
case, we use the regular Excel syntax of "A1:F1" to identify the first
row between columns A and F. In the other cases, we ask for the Range
object by giving the method two cell locations which identify the
top-left and bottom-right corner of the desired range. This form
allows us to easily use numbers rather than mapping to letters which
becomes more complex when we have to deal with column names such as
CF.
We set the Value property of the range by assigning it as many values
from R as there are cells in the range. Note that this only works if
the range is one-dimensional. If it is a rectangle, the results are
not as you may expect. Additionally, when we put the sample values
for each bootstrap within the loop from 1 to B, we have to explicitly
coerce the R vector to an appropriate form. For some reason, Excel
will not arrange the values vertically, but rather reuses the first
element for each cell. So we use
asCOMArray()
to make
this work.
We can spend time formatting the output in a nicer way. We can make
the names of the summary elements in row 1 appear as bold. The Range
object has a Font property and we can set its characteristics to make
the text bold.
r = sheet$Range("A1:F1")
r[["Font"]][["Bold"]] = TRUE
Similarly, we can change the font by specifying a new Name property,
and change how large it is via the Size property. And we can change
its color by specifying a value for the (you guessed it) Color
property of the font.
We can specify the format of the value of a cell
such as using a Date or Currency or Percentage
representation.
One specifies the format string
as the value for the
NumberFormat
property.
The format of the string is something you should lookup in
the Excel documentation.
The following uses
a currency format that includes ,'s to separate
digits in the usual manner, and
marks negative quantities
in the color red and enclosed in parentheses:
range[["NumberFormat"]] = "$#,##0.00_);[Red]($#,##0.00)"
|
---|
There are many tricks one can use when dealing with
ranges
One can access the entire row from
a cell using
range[["EntireRow"]]
We can make cells inactive to avoid users modifying them,
intentionally or unintentionally.
We can change the Interior of a cell to
change its background color and shading.
We can change the Borders
of a Range or Cell.
Let's add a plot, a histogram of the results in our bootstrap object.
We can do this in either of two ways (or both). Excel can create
"charts" and so we could use that mechanism. Or alternatively, we
could create a plot via R. We have a great deal more control in R and
can produce far more interesting plots, even with high-level commands
using that, so let's pursue that. Of course, if we are running R, we
can simply create a plot in an R graphics window and we are done. This
works well for some contexts. In other cases, we will want to put the
plot onto the worksheet area. The simplest thing to do is to create a
JPEG graphics device in R and to put the R plot in that file. Then, we
will put that image on the worksheet.
jpeg("bootHist.jpg", width = 500, height = 600)
hist(bootData$results, prob = TRUE, main = "Bootstrap Distribution", xlab = "bootstrap values")
points(density(bootData$results))
dev.off()
|
---|
Now the file bootHist.jpg contains the image we want. We can put this
on the sheet as a Shape object. We use the Shapes list object to do
this by calling its AddPicture method.
shapes = sheet[["Shapes"]]
shapes$AddPicture(paste(getwd(), "bootHist.jpg", sep = "/"), LinkToFile = FALSE, SaveWithDocument = TRUE,
Top = 1, Left = 20, Width = 500, Height = 600)
|
---|
This works for Tellis/Lattice plots also.
The alternative is to use Excel's charting facilities.
When we do this,
Let's extend our example a little by allowing the user to specify the
original sample data as a column in an Excel spreadsheet. There are
several ways that they might specify where the data is located such as
just giving the sheet which contains only one column of data, or
specifying the sheet and the column name or alternatively a Range
which would allow the user to specify a subset of the entire data.
Let's assume that they give us just a sheet. We can then as the sheet
for the UsedRange which is the smallest rectangular region enclosing
all the data in the sheet.
We can get the values in the cells via the Value property, e.g.
data = unlist(sheet$UsedRange()[["Value"]])
|
---|
We use unlist since the Value property
maintains the elements in list form since they may have different
types. In our case, we expect them all to be numbers, or
at least have the same type. Empty cells can cause problems here
and should be dealt with.
If the user gives us the Range or the sheet and column, we can use the
same approach.
Our
showBootstrapResults()
function works just as well
in this context as it doesn't concern itself with where the original
data were located.
We can go further and try to put a more convenient interface on this
from the non-R user's perspective. Rather than having to give R
commands, we might envisage an application where the user brings up R
and it starts Excel and prompts the user to insert data, specify the
statistic to be bootstraped and the number of times. When the user
has given these, then R will perform the relevant function calls.
This is quite different. R is now waiting for an event and so we need
to explore how we can arrange this setup in DCOM. It is not very
difficult in practice, but we'll go through the low-level steps to
illustrate what is actually involved and then present the higher-level
tools.
Rather than starting with a sophisticated example, let's start with a
simple one. We will present the user with a worksheet and ask them to
put the data in the third column and specify the statistic of interest
and the number of repetitions in a template given in the top 2x2
rectangle. We can manually construct this template worksheet or we
could programmatically generate it in R. Use whichever is most
appropriate in the long run.
(See bootstraptemplate.xls)
The user should set the statistic of choice and modify the number of
repetitions if necessary and then insert the data. The change to any
cell will cause the R function that gets the data and runs the
bootstrap and generates the output to be run.
That R function looks something like
runBootstrapFromExcel =
function(sheet)
{
r = sheet$UsedRange()
# If there are less than three columns, then we have no data.
if(r[["Columns"]]$Count() < 3)
return(NULL)
# Get the data from the 3rd column.
r = sheet$Range(sheet$Cells(1,3), sheet$Cells(r[["Rows"]]$Count(), 3))
data = unlist(r[["Value"]])
r = sheet$Range(sheet$Cells(2, 2), sheet$Cells(3, 2))
tmp = r[["Value"]]
statistic = get(tmp[[1]], mode = "function")
boot = uniBootstrap(data, statistic, tmp[[2]])
showBootstrapResults(boot, book = sheet[["Parent"]])
TRUE
}
|
---|
We could get all the values from the UsedRange in
a single step and then process them in R to extract
the meaningful ones.
tmp = sheet$UsedRange()[["Value"]]
statistic = get(tmp[[2]][[1]], mode = "function")
B = tmp[[2]][[2]]
data = unlist(tmp[[3]])
|
---|
Now, the last remaining step is the one we have yet cover in this
introduction to the R DCOM facilities. This is how we arrange to have
the R function called whenever any of the cells in the sheet are
changed. In addition to methods and properties, a COM object can make
events available from its interface. Events are asynchronous actions
and other code can connect to these event sources to be notified when
the events occur. Generally, an event source may notify a "listener"
of several different types of events. And an object may actually be a
source of several different groups of events in which case it would
provide multiple event sources.
We can find the connection points for our worksheet:
library(RDCOMEvents)
getConnectionPoints(sheet)
|
---|
In our case, there is only a single connection point.
We find out what interface is needed for an object
connecting to that point. To do this, we
lookup the interface definition corresponding to the
UUID of the connection point.
We do this via the type library which contains
information about the different COM classes in
the application library.
uid = names(getConnectionPoints(sheet))
library(SWinTypeLibs)
lib = LoadTypeLib(sheet)
eventInterface = lib[[uid]]
|
---|
The
eventInterface object
is a reference to all the information
about the COM class that can react to the worksheet's
events. We can ask what methods this interface
must have via the call
funcs = getFuncs(eventInterface)
|
---|
This returns a named list of method descriptions. The names are those
of the methods, and each element provides details about the parameters
of the method and the expected return type. The first 7 methods are
entirely generic DCOM object methods and not part of the event
interface. Ignoring these, we see that there are event methods for
SelectionChange, BeforeDoubleClick, BeforeRightClick, Activate,
Deactivate, Calculate, Change, FollowHyperlink and PivotTableUpdate.
We should also note that these events can be obtained by listening to
the parent workbook or even the application. In these contexts, more
parameters will typically be provided to indicate in which sheet the
event occurred.
If we look at the Change method, we see it has
just one parameter and that is a Range object.
It is not necessarily clear what that parameter represents
from this basic information, but it is in fact the
Range object identifying the Cell(s) that were changed
in the sheet that gave rise to the event.
We could use the location of the cell to
determine if we need to re-fetch the data,
whether the statistic or number of repetitions
has changed, or whether it is one of the other cells
that is of no consequence and could be ignored.
In order to connect R to the worksheet to
listen for Change events, we need to
define a COM object that implements this event interface
and specifically provide a method.
The R function to handle the Change event can be written as
function(target)
{
sheet = target[["Parent"]]
runBootstrapFromExcel( sheet)
}
|
---|
We can compute the worksheet from the Range via the
Parent property, or alternatively, we already have
this when we created the Excel worksheet to display the
template and could use that version.
In that case, we would use a closure to make
sheet object local to the event handler function.
From the method description, the return type is void
which means there is no return value. So whatever is
returned from the R function will be ignored.
We are now very close to being able to put all of this together.
The RDCOMEvents package provides functions for
connecting the dots.
We first create a template event handler definition
using
createCOMEventServerInfo()
with code such as
s = createCOMEventServerInfo(eventInterface, complete = TRUE)
|
---|
We then merge our own handler with these templates and create the
actual event handler with
createCOMEventServer()
:
s@methods$Change =
function(target)
{
sheet = target[["Parent"]]
runBootstrapFromExcel( sheet)
}
eventHandler = createCOMEventServer(s@methods, s@ids, direct = TRUE)
|
---|
The
eventHandler is an actual COM object
implemented in R with methods to handle any of the events
that come from a Worksheet object in Excel.
All that remains now is to actually connect
the handler object to the event source.
We do this with the
R function
connectConnectionPoint()
or its alias
Advise()
:
Advise(getConnectionPoints(sheet)[[1]], eventHandler)
|
---|
Now, any time the user modifies a cell value, the handler function
will be called and a new bootstrap calculation will be done if the
user has provided data and the results will be shown in a new
Worksheet.
Let's see this from beginning to end.
We start R and run a script which does
the following:
- Opens Excel with the
bootstrap template worksheet displayed
-
Connects an event handler for the Change
action to that sheet.
The handler checks that the data has been
specified and, if so, performs the bootstrap
calculations and displays the results.
The R code for all of this is
library(RDCOMClient)
library(RDCOMEvents)
ex = COMCreate("Excel.Application")
book = ex$Workbooks()$Open("C:/bootstrapTemplate.xls")
sheet = book$Worksheets()$Item(1)
ex[["Visible"]] = TRUE
library(SWinTypeLibs)
lib = LoadTypeLib(sheet)
uid = names(getConnectionPoints(sheet))
eventInterface = lib[[uid]]
s = createCOMEventServerInfo(eventInterface, complete = TRUE)
s@methods$Change = function(target) {
sheet = target[["Parent"]]
runBootstrapFromExcel( sheet)
}
eventHandler = createCOMEventServer(s@methods, s@ids, direct = TRUE)
Advise(getConnectionPoints(sheet)[[1]], eventHandler)
|
---|
Note that we have to define the runBootstrapFromExcel function and the
other supporting functions that we developed earlier to do the
simulation and display the results. We can source these in from a
different file or have them in a package and load that.
The interface is a little clumsy. If the user wants to change two
data values, she first modifies one cell and when she moves to the
second cell, a new bootstrap calculation is done. We are performing
this each time a cell changes. But the user may want to make multiple
changes and then perform the computations. So she needs to be in
control of when this happens. We would more appropriately use a button
to specify when the computations should be performed rather than on
every change. There is a kludgy but simple way to do this using the
existing code above. We could put a hyperlink on the worksheet and
when the user clicks on that, our event handler is notified via the
FollowHyperlink event method. The only real change needed is setting
the event handler for the FollowHyperlink. We can ignore the actual
link being clicked as we are only interested in when it is clicked.
(If we had multiple links, we would have to differentiate between
which was clicked by looking at the text of the URI, e.g.
link[["Address"]]
or
link[["TextToDisplay"]]
.)
Buttons and other Active X controls
The hyperlink above is a convenient way to add an "action" or "submit"
button. But as you have seen with Gtk and RGtk and are familiar with
from your own experiences, there are much nicer GUI components. For
example, we can use a button. And there are spinbox objects for
specifying numbers, and sliders for changing the value of a
parameters. And, Office provides ways to add such interactive
components to a Worksheet or a Word document and to respond to the
associated events. Just as with Gtk, there are two steps to doing
this: creating and positioning the component to build the GUI, and
then connecting the callbacks to handle the events and provide the
interactive functionality.
Like RGtk and Glade, you can interactively create the GUI by placing
controls on a worksheet. Go to the View menu and the Toolbar item and
select Control Toolbox. This will bring up a little free-floating
Window containing components that you can drag onto the worksheet.
Unlike Gtk, resizing the window doesn't rearrange the GUI
components. Instead, in Excel the elements are placed at fixed
positions and have fixed dimensions. Given this, the interactive
layout can be convenient. The only issue that arises is that when we
design the GUI in Excel, we need to be able to connect R functions and
event handlers to them from within R. Fortunately, there is a way to
access these components on a worksheet (or across several worksheets
if need be) from within R and use them directly. Each Worksheet has a
list of the OLE objects it houses and these are available via the
OleObjects property.
ole = sheet[["OleObjects"]]
|
---|
This is a list and has Count and Item methods to access
the elements.
Each element is a general object and it is up to us
to make sense of it.
We can ask each object its name, e.g.
sapply(1:ole$Count(), function(i) ole$Item(i)[["Name"]])
|
---|
Since
ole is essentially a list, we can treat it like
one in R. The
RDCOMClient package provides a
COMList and
COMList()
constructor
function which creates the given DCOM object in a more S-like manner.
It provides a method for
length()
rather than using
ole$Count()
, facilities for accessing individual
elements e.g. ole[[2]], and a more convenient way of running
lapply/sapply over the elements of the list, e.g.
sapply(ole, function(obj) obj[["Name"]])
|
---|
We'll return to what we can actually do with these objects
from R and how to find out what methods the support.
You can also create the components directly within R
using the Add method for the
OLEObjects list.
To create a control object, you have
to specify the class.
The different types of controls are:
CheckBox, ListBox, ComboBox, CommandButton, Frame, Image, Label,
|
MultiPage, OptionButton, ScrollBar, SpinButton, TabStrip, TextBox,
|
ToggleButton
|
To create an object of one of these types, you use
"Forms.(type).1" as the class type in a call to
the Add method for the
Worksheets
OleObjects property.
ole = sheet[["OleObjects"]]
ctrl = ole$Add(ClassType = "Forms.ComboBox.1",
Top = , Left = , Width = , Height = )
|
---|
Note that we didn't have to give the object a name. Instead, we have
assigned it to an R variable and can refer to it directly rather than
by name on the sheet. This is good programming practice, avoiding the
possibility of conflicting names for components. The objects may have
a name by default, but that name is not important for our purposes.
Each element in the OleObjects list is in fact a wrapper to the real
ActiveX object. To get at that, we use the
Object property.
Let's return to our example. We can either interactively construct
our form, or we can programmatically generate it in R. We'll endeavor
to use both approaches.
We'll start with the programmatic version. We get a reference to the
Worksheet of interest in the R variable
sheet. Then,
we refer to the
OleObjects and start
adding to that list. We place a collection of three labels to
identify the interactive elements for the user. Then we add the
elements themselves. We use a "combo box" for selecting the statistic
of interest and a "spin box" for specifying the number of bootstrap
samples to create. We place a button below these which, when clicked,
will run the bootstrap.
ole = COMList(sheet[["OleObjects"]])
left = 260
height = 20
top = 20
sapply(c("Statistic", "Number of Repetitions"),
function(txt) {
label = ole$Add("Forms.Label.1", Top = 20, Left = left - 100, Width = 60, Height = height-1)
label[["Object"]][["Caption"]] = txt
})
statistic = ole$Add("Forms.ComboBox.1", Top = top, Left = left, Width = 60, Height = height - 1)
repetitions = ole$Add("Forms.SpinButton.1", Top = top + 20, Left = left, Width = 60, Height = height + 10)
run = ole$Add("Forms.CommandButton.1", Top = top + 40 + 10, Left = left, Width = 60, Height = height - 1)
run[["Object"]][["Caption"]] = "Run Bootstrap"
sapply(c("Mean", "Median", "Minimum", "Maximum"),
function(x) statistic[["Object"]]$AddItem(x))
# Set initial values
statistic[["Object"]][["Value"]] = "Mean"
repetitions[["Object"]][["Max"]] = 10000
repetitions[["Object"]][["Value"]] = 999
repetitions[["Object"]][["Orientation"]] = 0
# Where should the spin box update the value.
repetitions[["Object"]][["LinkedCell"]] = "E5"
|
---|
It is hard to show how to build this interactively in a static
document. One uses the "Control Toolbox" toolbar and places the
elements on the worksheet. Then, one edits the Properties table to
specify the different settings. This appears to be the easiest route
at first. However, as we modify the design of the form, it becomes
cumbersome. The level of manual involvement is quite large. And,
importantly, nobody can see what was done except by looking at the
worksheet itself. While this is fine, the concepts of the layout are
lost to the reader (not the user, hopefully). It makes sense to use
an interactive layout approach when we are doing the programming in
Excel or Visual Basic directly. However, when we are providing the
event handlers in R, we have to connect the ActiveX components to the
R function that performs the actions. It is up to the form creator to
do this based on some naming convention or knowledge of the form. The
elements corresponding to the R variables
statistic,
repetitions and
run are available int
OleObjects list. It is up to us however to determine which is which
and this can only be done based on knowledge of the form or the naming
convention.
Regardless of how we actually create the form the final thing we have
to do is establish an event handler to perform the bootstrap samples
and report the results in a separate sheet. We will of course use the
functions we developed earlier (
uniBootstrap()
and
showBootstrapResults()
) to do the actual work.
All that we need to do is detect when the user
clicks on the "Run Bootstrap" command button.
The other form elements within our GUI
will take care of themselves when the user
interacts with them to change their values.
We need only fetch their values when performing the
bootstrap, i.e. when the button has been pressed.
So our button event handler function in R would look
something like
runBootstrapHandler =
function(...)
{
B = repetitions[["Object"]][["Value"]]
stat = statistic[["Object"]][["Value"]]
contents = sheet$UsedRange()[["Value"]]
results = uniBootstrap(unlist(contents[[1]]), statistics[[stat]], B)
showBootstrapResults(results, book = sheet[["Parent"]])
}
|
---|
We get the data from the first actual column of our sheet via the
UsedRange. This can be done in different ways with more error
checking performed, but we are merely trying to get the point across
in this example.
The last step is to connect this function to the actual button event.
We use the same steps as we did to catch the sheet Change event in our
example above. Specifically, we get the connection point for the
button and create a template interface for that event handler
type. Then we add our own handler function and create the actual event
handler instance and call
Advise()
to establish the
connectivity. We use slightly different code here to illustrate
different aspects of the
RDCOMEvents functions.
We get the specific connection point for the button via
findConnectionPoint()
. Also, we specify our event
handler in the call to
createCOMEventServerInfo()
rather than adding to
s after it is created. And we
pass the resulting server template directly to
createCOMEventServer()
rather than its methods and name
identifier map. There is no conceptual difference between this
example and the one earlier. Note also that we work from the actual
button (i.e.
run[["Object"]]
) and not the OLE object.
btn = run[["Object"]]
lib = LoadTypeLib(btn)
typeInfo = lib[["CommandButtonEvents"]]
connectionPoint = findConnectionPoint(btn, typeInfo)
s = createCOMEventServerInfo(typeInfo, complete = TRUE,
methods = list(Click = runBootstrapHandler))
handler = createCOMEventServer(s)
Advise(connectionPoint, handler)
|
---|
At this point, the code should be connected
to the button and if you add data into column A,
you can click on the button and perform a bootstrap.
You can also extend this to work with only
the selected data if a selection has been made
or the entire thing if there is no current selected
region.
There is a second collection of form components in Excel which are
available via the Forms toolbar. These are ostensibly simpler to use,
but are different and less general than the Active components.