Using COM facilities in R

Duncan Temple Lang

Table of Contents

Getting Started
Example
Reading Data from Excel
Buttons and other Active X controls

Abstract

The RDCOMClient provides relatively easy to use facilities for creating COM objects from R and using their methods to perform different actions and accessing their properties. This package is similar in spirit and in functionality to that provided in Python and Perl, but this package of course focuses on providing the facilities in R. It offers a dynamic mechanism for method/property access for arbitrary COM objects that support the IDispatch interface.

More efficient and structured access can be automatically generated using functions in the SWinTypeLibs.

Getting Started

The RDCOMClient package provides general tools for accessing functionality in COM servers. The first thing we do is to get a handle to a COM server object. There are two ways to do this. One is to create a new instance of registered COM servers. By registered, we mean a COM server that was explicitly published and put into the Windows registry with a name and a UUID (universal identifier). The other technique is to get a handle to an existing COM object that was created earlier, not necessarily by R but perhaps manually by the user. For example, I may have opened Excel and done some work in a spreadsheet. I know want to bring the data from that spreadsheet into R and have R add content to that sheet. In that case, I need to ensure that R works with that particular instance of Excel and not a new Excel object with an entirely separate, empty worksheet.

To create a new instance of a registered COM server class, we use the function

COMCreate

. We give it either the name of the COM class, e.g. "Excel.Application", as a string or alternatively the UUID. The latter is a very long string that was generated to uniquely identify the COM server with an epsilon chance of any other UUID being generated that would be the same. To create a new instance of the Excel application, we would use the command

ex = COMCreate("Excel.Application")

Similarly, if we wanted to create an instance of MS Word, we would use

word = COMCreate("Word.Application")

and Microsoft's Internet Explorer can be brought up using

ie = COMCreate("InternetExplorer.Application")

In each of these examples, the resulting R objects (ex, word and ie) are references to a generic DCOM object. R knows nothing about the specifics of these applications. It cannot tell the characteristics of one from the other, but it can access each of their methods and properties assuming that the R "programmer"/user knows their names. We'll see how we can discover the names of these methods and properties from within R also.

Now that we have a reference/handle to a COM object in which we are interested, we can now move on and start accessing its data (properties) and calling its methods (functions). There underlying function that does this is .COM() . This handles getting and setting properties and invoking methods, with or without arguments. In fact, the different between methods and properties is a little nebulous. Properties can often be considered methods. But that is not important. And, to a large extend, the .COM() function is not important either. Rather, we typically use syntactic sugar to make accessing these methods and properties more facile. To access a method, we use the $ operator on the COM object. For example, to call the CheckSpelling method in the Excel Application object, we can use the R command

ex$CheckSpelling("typo")

The general form is

comObj$methodName(arg1, arg2, arg3, ...)

We are calling the function named "methodName" in the server object comObj and passing it the R values as arguments. This is equivalent to

.COM(comObj, "methodName", arg1, arg2, arg3, ...)

but this is slightly more typing and doesn't indicate that comObj may change as a result of the computation.

To access a DCOM property, we can again use the .COM() function or the [[ operator with the name of the property. For example, when we create a new Excel.Application instance, it will not be displayed on the screen. Instead, it exists and we can do computations within such as loading a file or inserting values into it before the user sees it. When we want to display it, we set the Visible to a non-zero value. We can query the current value of the Visible property using the R expressions

ex[["Visible"]]

and we can set it from R using the command

ex[["Visible"]] = TRUE

This will have the side-effect of making the application appear on the screen.

The $ for methods and [[ for properties are thin convenience layers on top of the .COM() function. That is the workhorse of the RDCOMClient package. And we can discuss how this handles its arguments and passes them to the DCOM method, and similarly how it returns values from these method invocations.

DCOM, like CORBA, aims to provide a programming language-independent mechanism for remote method invocation (RMI) or procedure calls (RPC). Almost all programming languages provide basic data structures such as integer and real valued numbers. They also provide characters and/or strings, a collection of characters that can be considered a unit of text. Some have logical or boolean values. All provide some way of having collections of these primitive elements. DCOM therefore defines data types for many of these commonly used data types. We map between these DCOM data types and R's own data types when converting arguments and return values.

DCOM type	R type
I4, I1, I2, INT	integer
BOOL	logical
R8	numeric
BSTR	character
UI1, UI2, UINT, UI4	numeric
CY, DATE, DECIMAL, HRESULT	numeric
UNKNOWN	COMUnknownObject
DISPATCH	COMIDispatch
EMPTY, NULL, VOID	NULL

The types RECORD, FILETIE, BLOBL, STREAM, STORAGE, STREAMED_OBJECT, LPWSR, PTR, ERROR, VARIANT, CARRAY, USERDEFINED are not yet handled.

So we have seen how to create COM instances, connect to existing COM instances, and call methods and get and set property values in a COM object. The important part of making use of a COM server is not the underlying .COM() mechanism, but more understanding what methods and properties are available to you. There are four strategies for finding this out.

Trial and error by using educated guesses for the names of the properties and methods! This is clearly a frustrating and inefficient approach.
Find and read books or Web pages that describe the server and its high-level computational model and facilities. For large applications like Excel, Word, Access, and so on, there are many good resources at this level. The Web pages at Microsoft are a good start and there are many tutorials about specific topics with code snippets for different tasks. Most of these pages and books are written for Visual Basic programmers. While this makes the code less relevant and slightly harder to read for us, it is still very useful. All we are looking for is the methods and properties and how they can be used. And as with most good code, it is the description that surrounds it rather than the code itself that is most informative to humans.
For Microsoft applications such as Excel and Word and Office generally, there is an interactive visual tool for browsing classes and their methods and properties (and supported events). Using the Visual Basic Editor and its Object Browser tool, one can examine the contents of what is called a type library. This is a collection of information about data types and classes and their methods and properties that can be programmatically accessed. The Object Browser provides a GUI front-end to this information and is relatively convenient way to explore the entire class hierarchy, with easy links between the classes referenced as parameters and return values of methods and properties.

The type library is not specific to the Visual Basic Editor's Object Browser. We can access it from any other application that can read and understand its contents. The SWinTypeLibs package provides facilities to read and explore arbitrary type libraries within an R session. One can load the type library associated with a particular COM object or alternatively specify the file name of a type library without needing a instance of the COM object. One can find out about all the different classes using R commands such as names() and getFuncs() and getElements() . An example may help. It is not essential that you understand all the details here. It is good to know that they exist and are accessible from R. We'll load the library associated with our Excel instance and look at all the classes, etc. it contains.

lib = LoadTypeLib(ex) names(lib)

  [1] "Adjustments"                "CalloutFormat"              "ColorFormat"                "LineFormat"                
  [5] "ShapeNode"                  "ShapeNodes"                 "PictureFormat"              "ShadowFormat"              
  [9] "TextEffectFormat"           "ThreeDFormat"               "FillFormat"                 "DiagramNodes"              
 [13] "DiagramNodeChildren"        "DiagramNode"                "IRTDUpdateEvent"            "IRtdServer"                
 [17] "Constants"                  "XlCreator"                  "XlChartGallery"             "XlColorIndex"              
 [21] "XlEndStyleCap"              "XlRowCol"                   "XlScaleType"                "XlDataSeriesType"        
    ...
    ...
[625] "SmartTags"                  "SmartTagRecognizer"         "SmartTagRecognizers"        "SmartTagOptions"           
[629] "SpellingOptions"            "Speech"                     "Protection"                 "PivotItemList"             
[633] "Tab"                        "AllowEditRanges"            "AllowEditRange"             "UserAccessList"            
[637] "UserAccess"                 "RTD"                        "Diagram"                    "IDummy"                    
[641] "ICanvasShapes"              "QueryTable"                 "Application"                "Chart"                     
[645] "Worksheet"                  "Global"                     "Workbook"                   "OLEObject"

So there are 648 entries in the library. Many of the, such as XlColorIndex, are what are called enumerated constants which provide names for specific values which are used as arguments to methods. Others describe interfaces for event handlers which we will see later. But elements like Application (second last row) represent the top-level classes we want to work with. Each element has a class and we are typically interested in the ITypeInfoDispatch objects which correspond to the IDispatch COM server objects. We can find all 241 of these with the command

names(lib)[sapply(lib, class) == "ITypeInfoDispatch"]

  [1] "Adjustments"          "CalloutFormat"        "ColorFormat"          "LineFormat"           "ShapeNode"            "ShapeNodes"          
  [7] "PictureFormat"        "ShadowFormat"         "TextEffectFormat"     "ThreeDFormat"         "FillFormat"           "DiagramNodes"        
 [13] "DiagramNodeChildren"  "DiagramNode"          "IRTDUpdateEvent"      "IRtdServer"           "_Application"         "_Chart"              
 [19] "Sheets"               "_Worksheet"           "_Global"              "_Workbook"            "Workbooks"            "PublishObject"       
 [25] "DefaultWebOptions"    "WebOptions"           "TreeviewControl"      "CubeField"            "CubeFields"           "Font"                
 [31] "Window"               "Windows"              "AppEvents"            "WorksheetFunction"    "Range"                "ChartEvents"         
 [37] "VPageBreak"           "HPageBreak"           "HPageBreaks"          "VPageBreaks"          "RecentFile"           "RecentFiles"         
 [43] "DocEvents"            "Style"                "Styles"               "Borders"              "AddIn"                "AddIns"              
 [49] "Toolbar"              "Toolbars"             "ToolbarButton"        "ToolbarButtons"       "Areas"                "WorkbookEvents"      
 [55] "MenuBars"             "MenuBar"              "Menus"                "Menu"                 "MenuItems"            "MenuItem"            
 [61] "Charts"               "DrawingObjects"       "PivotCache"           "PivotCaches"          "PivotFormula"         "PivotFormulas"       
 [67] "PivotTable"           "PivotTables"          "PivotField"           "PivotFields"          "CalculatedFields"     "PivotItem"           
 [73] "PivotItems"           "CalculatedItems"      "Characters"           "Dialogs"              "Dialog"               "SoundNote"           
 [79] "Button"               "Buttons"              "CheckBox"             "CheckBoxes"           "OptionButton"         "OptionButtons"       
 [85] "EditBox"              "EditBoxes"            "ScrollBar"            "ScrollBars"           "ListBox"              "ListBoxes"           
 [91] "GroupBox"             "GroupBoxes"           "DropDown"             "DropDowns"            "Spinner"              "Spinners"            
 [97] "DialogFrame"          "Label"                "Labels"               "Panes"                "Pane"                 "Scenarios"           
[103] "Scenario"             "GroupObject"          "GroupObjects"         "Line"                 "Lines"                "Rectangle"           
[109] "Rectangles"           "Oval"                 "Ovals"                "Arc"                  "Arcs"                 "OLEObjectEvents"     
[115] "_OLEObject"           "OLEObjects"           "TextBox"              "TextBoxes"            "Picture"              "Pictures"            
[121] "Drawing"              "Drawings"             "RoutingSlip"          "Outline"              "Module"               "Modules"             
[127] "DialogSheet"          "DialogSheets"         "Worksheets"           "PageSetup"            "Names"                "Name"                
[133] "ChartObject"          "ChartObjects"         "Mailer"               "CustomViews"          "CustomView"           "FormatConditions"    
[139] "FormatCondition"      "Comments"             "Comment"              "RefreshEvents"        "_QueryTable"          "QueryTables"         
[145] "Parameter"            "Parameters"           "ODBCError"            "ODBCErrors"           "Validation"           "Hyperlinks"          
[151] "Hyperlink"            "AutoFilter"           "Filters"              "Filter"               "AutoCorrect"          "Border"              
[157] "Interior"             "ChartFillFormat"      "ChartColorFormat"     "Axis"                 "ChartTitle"           "AxisTitle"           
[163] "ChartGroup"           "ChartGroups"          "Axes"                 "Points"               "Point"                "Series"              
[169] "SeriesCollection"     "DataLabel"            "DataLabels"           "LegendEntry"          "LegendEntries"        "LegendKey"           
[175] "Trendlines"           "Trendline"            "Corners"              "SeriesLines"          "HiLoLines"            "Gridlines"           
[181] "DropLines"            "LeaderLines"          "UpBars"               "DownBars"             "Floor"                "Walls"               
[187] "TickLabels"           "PlotArea"             "ChartArea"            "Legend"               "ErrorBars"            "DataTable"           
[193] "Phonetic"             "Shape"                "Shapes"               "ShapeRange"           "GroupShapes"          "TextFrame"           
[199] "ConnectorFormat"      "FreeformBuilder"      "ControlFormat"        "OLEFormat"            "LinkFormat"           "PublishObjects"      
[205] "OLEDBError"           "OLEDBErrors"          "Phonetics"            "PivotLayout"          "DisplayUnitLabel"     "CellFormat"          
[211] "UsedObjects"          "CustomProperties"     "CustomProperty"       "CalculatedMembers"    "CalculatedMember"     "Watches"             
[217] "Watch"                "PivotCell"            "Graphic"              "AutoRecover"          "ErrorCheckingOptions" "Errors"              
[223] "Error"                "SmartTagAction"       "SmartTagActions"      "SmartTag"             "SmartTags"            "SmartTagRecognizer"  
[229] "SmartTagRecognizers"  "SmartTagOptions"      "SpellingOptions"      "Speech"               "Protection"           "PivotItemList"       
[235] "Tab"                  "AllowEditRanges"      "AllowEditRange"       "UserAccessList"       "UserAccess"           "RTD"                 
[241] "Diagram"

"_Application" is the actual class of the COM object returned via COMCreate("Excel.Application"). We can examine its methods and properties using

funcs = getFuncs(lib[["_Application"]])

Then we can look at these as we want. Again, each element has a name and a class.

table(sapply(funcs, class))

FunctionInvokeDescription    PropertyGetDescription    PropertySetDescription PropertySetRefDescription 
                       79                       196                        96                         2

And

names(funcs)

tells us all the different methods and properties that are available to us. If we want to see the Visible property for example, we can look at it as funcs$Visible. This says it is a PropertyGetDescription and it tells us that its return type is a "VARIANT_BOOL". This is just a logical value.

We know that we can set the value of this property, yet we are looking at the PropertyGetDescription object. To find the PropertySetDescription, we have to find the other element named Visible in the list of methods and properties.

which(names(funcs) == "Visible")

That second element tells us about setting the property.

To look at methods for the Worksheet class, we can query the corresponding IDispatch type

funcs = getFuncs(lib[["_Worksheet"]])

Again, we use _Worksheet rather than Worksheet since the latter is an aggregation of the former and some event interfaces. Let's look at the Range method which is used to describe a collection of cells on the worksheet.

funcs$Range

Looking at the elements of this as it is printed


An object of class "PropertyGetDescription"
Slot "returnType":
An object of class "TypeDescriptionRef"
Slot "reftype":
[1] 50374848

Slot "name":
[1] "<User Defined>"


Slot "parameters":
$Cell1
An object of class "ParameterDescription"
Slot "name":
[1] "Cell1"

Slot "type":
An object of class "TypeDescription"
Slot "name":
[1] "VARIANT"


Slot "style":
An object of class "ParameterStyle"
Slot "In":
[1] TRUE

Slot "out":
[1] FALSE

Slot "lcid":
[1] FALSE

Slot "retval":
[1] FALSE

Slot "optional":
[1] FALSE


Slot "defaultValue":
NULL


$Cell2
An object of class "ParameterDescription"
Slot "name":
[1] "Cell2"

Slot "type":
An object of class "TypeDescription"
Slot "name":
[1] "VARIANT"


Slot "style":
An object of class "ParameterStyle"
Slot "In":
[1] TRUE

Slot "out":
[1] FALSE

Slot "lcid":
[1] FALSE

Slot "retval":
[1] FALSE

Slot "optional":
[1] TRUE


Slot "defaultValue":
NULL



Slot "invokeType":
propertyget 
          2 

Slot "kind":
dispatch 
       4 

Slot "name":
[1] "Range"

Slot "hidden":
[1] TRUE

Slot "memid":
[1] 197

The parameters slot tells us that this expects two parameters, Cell1 and Cell2 and it says they can be generic values which is identified by type VARIANT. We know they can be names such as "A1", "C10" and we'll see that they can be other Range objects to identify individual cells also.

The return type is defined within the library itself. We can resolve it

getRefTypeName(lib[["_Worksheet"]], funcs$Range@returnType@reftype)

and we get

    name
 "Range"

Example

We will develop and extend an example through the paper to illustrate different aspects of DCOM with R. We start by focusing on using R as a client. The example is very simple. We will use Excel to display the results of a univariate bootstrap. We write a function in R to do the bootstrap and display the results. The function takes as input the original data, the statistic to be calculated for each bootstrap sample, and the number of bootstrap repetitions (999) by default. We could use the boot package to do the bootstrap, but we will write it ourselves here as the code is very simple.

uniBootstrap =
function(data, statistic = median, B = 999)
{
  n = length(data)
  samples = matrix(sample(data, n * B, replace = TRUE), n, B)
  results = apply(samples, 2, statistic)

  list(samples = samples, results = results)
}

We can call this from within our R session, but now we want to display the samples along with the result for each sample and then a summary of the results object. So we can write a function that takes the output from uniBootstrap() and puts each column of the samples matrix into a column of a new Excel worksheet. It will put the result for that sample in a separate row above the samples and above that provide a simple summary of the entire collection of results.

showBootstrapResults =
function(bootData, book)
{
  n = nrow(bootData$samples)
  B = ncol(bootData$samples)

    # Create a new worksheet in the book.
  sheet = book$Worksheets()$Add()

    # Put the summary of the results vector in the first row.
  sumy = summary(bootData$results)
  r = sheet$Range("A1:F1")
  r[["Value"]] = names(sumy) 
  r = sheet$Range("A2:F2")
  r[["Value"]] = as.numeric(sumy)

    # Put the results vector in the 4th row.
  r = sheet$Range(sheet$Cells(4, 1), sheet$Cells(4, B))
  r[["Value"]] = bootData$results

    #
  for(i in 1:B)  {
    r = sheet$Range(sheet$Cells(6, i), sheet$Cells(6 + n - 1, i))
    r[["Value"]] = asCOMArray(bootData$samples[, i])
  }

  sheet
}

This code illustrates two ways of accessing a range of cells. In the first case, we use the regular Excel syntax of "A1:F1" to identify the first row between columns A and F. In the other cases, we ask for the Range object by giving the method two cell locations which identify the top-left and bottom-right corner of the desired range. This form allows us to easily use numbers rather than mapping to letters which becomes more complex when we have to deal with column names such as CF.

We set the Value property of the range by assigning it as many values from R as there are cells in the range. Note that this only works if the range is one-dimensional. If it is a rectangle, the results are not as you may expect. Additionally, when we put the sample values for each bootstrap within the loop from 1 to B, we have to explicitly coerce the R vector to an appropriate form. For some reason, Excel will not arrange the values vertically, but rather reuses the first element for each cell. So we use asCOMArray() to make this work.

We can spend time formatting the output in a nicer way. We can make the names of the summary elements in row 1 appear as bold. The Range object has a Font property and we can set its characteristics to make the text bold.


r = sheet$Range("A1:F1")
r[["Font"]][["Bold"]] = TRUE

Similarly, we can change the font by specifying a new Name property, and change how large it is via the Size property. And we can change its color by specifying a value for the (you guessed it) Color property of the font.

We can specify the format of the value of a cell such as using a Date or Currency or Percentage representation. One specifies the format string as the value for the NumberFormat property. The format of the string is something you should lookup in the Excel documentation. The following uses a currency format that includes ,'s to separate digits in the usual manner, and marks negative quantities in the color red and enclosed in parentheses:

range[["NumberFormat"]] = "$#,##0.00_);[Red]($#,##0.00)"

There are many tricks one can use when dealing with ranges One can access the entire row from a cell using


range[["EntireRow"]]

We can make cells inactive to avoid users modifying them, intentionally or unintentionally. We can change the Interior of a cell to change its background color and shading. We can change the Borders of a Range or Cell.

Let's add a plot, a histogram of the results in our bootstrap object. We can do this in either of two ways (or both). Excel can create "charts" and so we could use that mechanism. Or alternatively, we could create a plot via R. We have a great deal more control in R and can produce far more interesting plots, even with high-level commands using that, so let's pursue that. Of course, if we are running R, we can simply create a plot in an R graphics window and we are done. This works well for some contexts. In other cases, we will want to put the plot onto the worksheet area. The simplest thing to do is to create a JPEG graphics device in R and to put the R plot in that file. Then, we will put that image on the worksheet.

jpeg("bootHist.jpg", width = 500, height = 600) hist(bootData$results, prob = TRUE, main = "Bootstrap Distribution", xlab = "bootstrap values") points(density(bootData$results)) dev.off()

Now the file bootHist.jpg contains the image we want. We can put this on the sheet as a Shape object. We use the Shapes list object to do this by calling its AddPicture method.

shapes = sheet[["Shapes"]] shapes$AddPicture(paste(getwd(), "bootHist.jpg", sep = "/"), LinkToFile = FALSE, SaveWithDocument = TRUE, Top = 1, Left = 20, Width = 500, Height = 600)

This works for Tellis/Lattice plots also.

The alternative is to use Excel's charting facilities. When we do this,

Reading Data from Excel

Let's extend our example a little by allowing the user to specify the original sample data as a column in an Excel spreadsheet. There are several ways that they might specify where the data is located such as just giving the sheet which contains only one column of data, or specifying the sheet and the column name or alternatively a Range which would allow the user to specify a subset of the entire data. Let's assume that they give us just a sheet. We can then as the sheet for the UsedRange which is the smallest rectangular region enclosing all the data in the sheet. We can get the values in the cells via the Value property, e.g.

data = unlist(sheet$UsedRange()[["Value"]])

We use unlist since the Value property maintains the elements in list form since they may have different types. In our case, we expect them all to be numbers, or at least have the same type. Empty cells can cause problems here and should be dealt with.

If the user gives us the Range or the sheet and column, we can use the same approach.

Our showBootstrapResults() function works just as well in this context as it doesn't concern itself with where the original data were located.

We can go further and try to put a more convenient interface on this from the non-R user's perspective. Rather than having to give R commands, we might envisage an application where the user brings up R and it starts Excel and prompts the user to insert data, specify the statistic to be bootstraped and the number of times. When the user has given these, then R will perform the relevant function calls. This is quite different. R is now waiting for an event and so we need to explore how we can arrange this setup in DCOM. It is not very difficult in practice, but we'll go through the low-level steps to illustrate what is actually involved and then present the higher-level tools.

Rather than starting with a sophisticated example, let's start with a simple one. We will present the user with a worksheet and ask them to put the data in the third column and specify the statistic of interest and the number of repetitions in a template given in the top 2x2 rectangle. We can manually construct this template worksheet or we could programmatically generate it in R. Use whichever is most appropriate in the long run. (See bootstraptemplate.xls)

The user should set the statistic of choice and modify the number of repetitions if necessary and then insert the data. The change to any cell will cause the R function that gets the data and runs the bootstrap and generates the output to be run. That R function looks something like


runBootstrapFromExcel =
function(sheet)
{
  r = sheet$UsedRange()

    # If there are less than three columns, then we have no data.
  if(r[["Columns"]]$Count() < 3)
     return(NULL)

   # Get the data from the 3rd column.
  r = sheet$Range(sheet$Cells(1,3), sheet$Cells(r[["Rows"]]$Count(), 3))
  data = unlist(r[["Value"]])

  r = sheet$Range(sheet$Cells(2, 2), sheet$Cells(3, 2))  
  tmp = r[["Value"]]

  statistic = get(tmp[[1]], mode = "function")

  boot = uniBootstrap(data, statistic, tmp[[2]])
  showBootstrapResults(boot, book = sheet[["Parent"]])
  
  TRUE
}

We could get all the values from the UsedRange in a single step and then process them in R to extract the meaningful ones.

tmp = sheet$UsedRange()[["Value"]] statistic = get(tmp[[2]][[1]], mode = "function") B = tmp[[2]][[2]] data = unlist(tmp[[3]])

Now, the last remaining step is the one we have yet cover in this introduction to the R DCOM facilities. This is how we arrange to have the R function called whenever any of the cells in the sheet are changed. In addition to methods and properties, a COM object can make events available from its interface. Events are asynchronous actions and other code can connect to these event sources to be notified when the events occur. Generally, an event source may notify a "listener" of several different types of events. And an object may actually be a source of several different groups of events in which case it would provide multiple event sources. We can find the connection points for our worksheet:

library(RDCOMEvents) getConnectionPoints(sheet)

In our case, there is only a single connection point. We find out what interface is needed for an object connecting to that point. To do this, we lookup the interface definition corresponding to the UUID of the connection point. We do this via the type library which contains information about the different COM classes in the application library.

uid = names(getConnectionPoints(sheet)) library(SWinTypeLibs) lib = LoadTypeLib(sheet) eventInterface = lib[[uid]]

The eventInterface object is a reference to all the information about the COM class that can react to the worksheet's events. We can ask what methods this interface must have via the call

funcs = getFuncs(eventInterface)

This returns a named list of method descriptions. The names are those of the methods, and each element provides details about the parameters of the method and the expected return type. The first 7 methods are entirely generic DCOM object methods and not part of the event interface. Ignoring these, we see that there are event methods for SelectionChange, BeforeDoubleClick, BeforeRightClick, Activate, Deactivate, Calculate, Change, FollowHyperlink and PivotTableUpdate. We should also note that these events can be obtained by listening to the parent workbook or even the application. In these contexts, more parameters will typically be provided to indicate in which sheet the event occurred.

If we look at the Change method, we see it has just one parameter and that is a Range object. It is not necessarily clear what that parameter represents from this basic information, but it is in fact the Range object identifying the Cell(s) that were changed in the sheet that gave rise to the event. We could use the location of the cell to determine if we need to re-fetch the data, whether the statistic or number of repetitions has changed, or whether it is one of the other cells that is of no consequence and could be ignored.

In order to connect R to the worksheet to listen for Change events, we need to define a COM object that implements this event interface and specifically provide a method. The R function to handle the Change event can be written as

function(target) { sheet = target[["Parent"]] runBootstrapFromExcel( sheet) }

We can compute the worksheet from the Range via the Parent property, or alternatively, we already have this when we created the Excel worksheet to display the template and could use that version. In that case, we would use a closure to make sheet object local to the event handler function.

From the method description, the return type is void which means there is no return value. So whatever is returned from the R function will be ignored.

We are now very close to being able to put all of this together. The RDCOMEvents package provides functions for connecting the dots. We first create a template event handler definition using createCOMEventServerInfo() with code such as

s = createCOMEventServerInfo(eventInterface, complete = TRUE)

We then merge our own handler with these templates and create the actual event handler with createCOMEventServer() :

s@methods$Change = function(target) { sheet = target[["Parent"]] runBootstrapFromExcel( sheet) } eventHandler = createCOMEventServer(s@methods, s@ids, direct = TRUE)

The eventHandler is an actual COM object implemented in R with methods to handle any of the events that come from a Worksheet object in Excel. All that remains now is to actually connect the handler object to the event source. We do this with the R function connectConnectionPoint() or its alias Advise() :

Advise(getConnectionPoints(sheet)[[1]], eventHandler)

Now, any time the user modifies a cell value, the handler function will be called and a new bootstrap calculation will be done if the user has provided data and the results will be shown in a new Worksheet.

Let's see this from beginning to end. We start R and run a script which does the following:

Opens Excel with the bootstrap template worksheet displayed
Connects an event handler for the Change action to that sheet. The handler checks that the data has been specified and, if so, performs the bootstrap calculations and displays the results.

The R code for all of this is

library(RDCOMClient)
library(RDCOMEvents)

ex = COMCreate("Excel.Application")
book = ex$Workbooks()$Open("C:/bootstrapTemplate.xls")

sheet = book$Worksheets()$Item(1) 
ex[["Visible"]] = TRUE

library(SWinTypeLibs)
lib = LoadTypeLib(sheet)
uid = names(getConnectionPoints(sheet))
eventInterface = lib[[uid]]

s = createCOMEventServerInfo(eventInterface, complete = TRUE)
s@methods$Change = function(target)   {
                      sheet = target[["Parent"]]
                      runBootstrapFromExcel( sheet)
                   }
eventHandler = createCOMEventServer(s@methods, s@ids, direct = TRUE)
Advise(getConnectionPoints(sheet)[[1]], eventHandler)

Note that we have to define the runBootstrapFromExcel function and the other supporting functions that we developed earlier to do the simulation and display the results. We can source these in from a different file or have them in a package and load that.

The interface is a little clumsy. If the user wants to change two data values, she first modifies one cell and when she moves to the second cell, a new bootstrap calculation is done. We are performing this each time a cell changes. But the user may want to make multiple changes and then perform the computations. So she needs to be in control of when this happens. We would more appropriately use a button to specify when the computations should be performed rather than on every change. There is a kludgy but simple way to do this using the existing code above. We could put a hyperlink on the worksheet and when the user clicks on that, our event handler is notified via the FollowHyperlink event method. The only real change needed is setting the event handler for the FollowHyperlink. We can ignore the actual link being clicked as we are only interested in when it is clicked. (If we had multiple links, we would have to differentiate between which was clicked by looking at the text of the URI, e.g. link[["Address"]] or link[["TextToDisplay"]].)

Buttons and other Active X controls

The hyperlink above is a convenient way to add an "action" or "submit" button. But as you have seen with Gtk and RGtk and are familiar with from your own experiences, there are much nicer GUI components. For example, we can use a button. And there are spinbox objects for specifying numbers, and sliders for changing the value of a parameters. And, Office provides ways to add such interactive components to a Worksheet or a Word document and to respond to the associated events. Just as with Gtk, there are two steps to doing this: creating and positioning the component to build the GUI, and then connecting the callbacks to handle the events and provide the interactive functionality.

Like RGtk and Glade, you can interactively create the GUI by placing controls on a worksheet. Go to the View menu and the Toolbar item and select Control Toolbox. This will bring up a little free-floating Window containing components that you can drag onto the worksheet. Unlike Gtk, resizing the window doesn't rearrange the GUI components. Instead, in Excel the elements are placed at fixed positions and have fixed dimensions. Given this, the interactive layout can be convenient. The only issue that arises is that when we design the GUI in Excel, we need to be able to connect R functions and event handlers to them from within R. Fortunately, there is a way to access these components on a worksheet (or across several worksheets if need be) from within R and use them directly. Each Worksheet has a list of the OLE objects it houses and these are available via the OleObjects property.

ole = sheet[["OleObjects"]]

This is a list and has Count and Item methods to access the elements. Each element is a general object and it is up to us to make sense of it. We can ask each object its name, e.g.

sapply(1:ole$Count(), function(i) ole$Item(i)[["Name"]])

Since ole is essentially a list, we can treat it like one in R. The RDCOMClient package provides a COMList and COMList() constructor function which creates the given DCOM object in a more S-like manner. It provides a method for length() rather than using ole$Count(), facilities for accessing individual elements e.g. ole[[2]], and a more convenient way of running lapply/sapply over the elements of the list, e.g.

sapply(ole, function(obj) obj[["Name"]])

We'll return to what we can actually do with these objects from R and how to find out what methods the support.

You can also create the components directly within R using the Add method for the OLEObjects list. To create a control object, you have to specify the class. The different types of controls are:

CheckBox, ListBox, ComboBox, CommandButton, Frame, Image, Label,

MultiPage, OptionButton, ScrollBar, SpinButton, TabStrip, TextBox,

ToggleButton

To create an object of one of these types, you use "Forms.(type).1" as the class type in a call to the Add method for the Worksheets OleObjects property.

ole = sheet[["OleObjects"]] ctrl = ole$Add(ClassType = "Forms.ComboBox.1", Top = , Left = , Width = , Height = )

Note that we didn't have to give the object a name. Instead, we have assigned it to an R variable and can refer to it directly rather than by name on the sheet. This is good programming practice, avoiding the possibility of conflicting names for components. The objects may have a name by default, but that name is not important for our purposes.

Each element in the OleObjects list is in fact a wrapper to the real ActiveX object. To get at that, we use the Object property.

Let's return to our example. We can either interactively construct our form, or we can programmatically generate it in R. We'll endeavor to use both approaches. We'll start with the programmatic version. We get a reference to the Worksheet of interest in the R variable sheet. Then, we refer to the OleObjects and start adding to that list. We place a collection of three labels to identify the interactive elements for the user. Then we add the elements themselves. We use a "combo box" for selecting the statistic of interest and a "spin box" for specifying the number of bootstrap samples to create. We place a button below these which, when clicked, will run the bootstrap.

ole = COMList(sheet[["OleObjects"]])

left = 260
height = 20
top = 20

sapply(c("Statistic", "Number of Repetitions"),
       function(txt) {
            label = ole$Add("Forms.Label.1", Top = 20, Left = left - 100, Width = 60, Height = height-1)
            label[["Object"]][["Caption"]] = txt
       })

statistic = ole$Add("Forms.ComboBox.1", Top = top, Left = left, Width = 60, Height = height - 1)
repetitions = ole$Add("Forms.SpinButton.1", Top = top + 20, Left = left, Width = 60, Height = height + 10)
run = ole$Add("Forms.CommandButton.1", Top = top + 40 + 10, Left = left, Width = 60, Height = height - 1)
run[["Object"]][["Caption"]] = "Run Bootstrap"

sapply(c("Mean", "Median", "Minimum", "Maximum"), 
            function(x) statistic[["Object"]]$AddItem(x))


  # Set initial values
statistic[["Object"]][["Value"]] = "Mean"

repetitions[["Object"]][["Max"]] = 10000
repetitions[["Object"]][["Value"]] = 999

repetitions[["Object"]][["Orientation"]] = 0

  # Where should the spin box update the value.
repetitions[["Object"]][["LinkedCell"]] = "E5"

It is hard to show how to build this interactively in a static document. One uses the "Control Toolbox" toolbar and places the elements on the worksheet. Then, one edits the Properties table to specify the different settings. This appears to be the easiest route at first. However, as we modify the design of the form, it becomes cumbersome. The level of manual involvement is quite large. And, importantly, nobody can see what was done except by looking at the worksheet itself. While this is fine, the concepts of the layout are lost to the reader (not the user, hopefully). It makes sense to use an interactive layout approach when we are doing the programming in Excel or Visual Basic directly. However, when we are providing the event handlers in R, we have to connect the ActiveX components to the R function that performs the actions. It is up to the form creator to do this based on some naming convention or knowledge of the form. The elements corresponding to the R variables statistic, repetitions and run are available int OleObjects list. It is up to us however to determine which is which and this can only be done based on knowledge of the form or the naming convention.

Regardless of how we actually create the form the final thing we have to do is establish an event handler to perform the bootstrap samples and report the results in a separate sheet. We will of course use the functions we developed earlier (uniBootstrap() and showBootstrapResults() ) to do the actual work. All that we need to do is detect when the user clicks on the "Run Bootstrap" command button. The other form elements within our GUI will take care of themselves when the user interacts with them to change their values. We need only fetch their values when performing the bootstrap, i.e. when the button has been pressed. So our button event handler function in R would look something like

runBootstrapHandler = 
function(...)
{
  B = repetitions[["Object"]][["Value"]]
  stat = statistic[["Object"]][["Value"]]
  contents = sheet$UsedRange()[["Value"]]

   
  results = uniBootstrap(unlist(contents[[1]]), statistics[[stat]], B)
  showBootstrapResults(results, book = sheet[["Parent"]])
}

We get the data from the first actual column of our sheet via the UsedRange. This can be done in different ways with more error checking performed, but we are merely trying to get the point across in this example.

The last step is to connect this function to the actual button event. We use the same steps as we did to catch the sheet Change event in our example above. Specifically, we get the connection point for the button and create a template interface for that event handler type. Then we add our own handler function and create the actual event handler instance and call Advise() to establish the connectivity. We use slightly different code here to illustrate different aspects of the RDCOMEvents functions. We get the specific connection point for the button via findConnectionPoint() . Also, we specify our event handler in the call to createCOMEventServerInfo() rather than adding to s after it is created. And we pass the resulting server template directly to createCOMEventServer() rather than its methods and name identifier map. There is no conceptual difference between this example and the one earlier. Note also that we work from the actual button (i.e. run[["Object"]]) and not the OLE object.

btn = run[["Object"]]
lib = LoadTypeLib(btn)
typeInfo = lib[["CommandButtonEvents"]]

connectionPoint = findConnectionPoint(btn, typeInfo)

s = createCOMEventServerInfo(typeInfo, complete = TRUE,
                             methods = list(Click = runBootstrapHandler))

handler = createCOMEventServer(s)
Advise(connectionPoint, handler)

At this point, the code should be connected to the button and if you add data into column A, you can click on the button and perform a bootstrap.

You can also extend this to work with only the selected data if a selection has been made or the entire thing if there is no current selected region.

There is a second collection of form components in Excel which are available via the Forms toolbar. These are ostensibly simpler to use, but are different and less general than the Active components.