CodeDepends

CodeDepends_0.2-0.tar.gz (06 June 2009)

The CodeDepends provides tools for processing R code (functions and scripts) and

calculating dependencies between the different expressions, to facilitate
- caching results and avoiding recomputation
- running code up to a particular expression or variable
providing general overview of code,
providing a brief vocabulary for high-level annotation of code,
identify and displaying high-level tasks,
creating call graphs between sets of functions
thinking about scripts as higher-level objects and facilitating thinking about aspects such as alternative approaches or branches where , and generally capturing the thought process of an analysis/computation with its code.

The primary motivation of this package is to provide a central location for potentially sophisticated dependency analysis between expressions that can be used for caching of intermediate results. See the cacher and weaver packages for use with Sweave. We are using this in XDynDocs, an XML-based dynamic document system that works for Docbook and Word.

We also use this to provide a higher-level view of code. The idea is that somebody viewing an R script would look at a figure representing the flow of variables or a graph of the relationships between the high-level tasks and what they are doing (e.g. data input, data cleaning, exploratory data analysis, modeling, and so on). These tools attempt to provide ways to look at code in more intuitive, high-level ways than detail-oriented code statements intended for an interpreter.

We also expect to use this package to identify potential

refactoring
redundancy
parallelism

We also want to use this to create much richer documents that capture the entire thought process and activities during an analysis or simulation. We want the author to be able to reproduce not only the final results they present to the reader, but the additional activities that

confirmed their approaches
alternative avenue that they tried
dead-ends that did not come to fruition
ideas for other things to pursue

This is the sense of reproducability that we want to get to, not just being able to repeat the computations but the analysis process. For this, we need a richer document and richer relationships between code blocks representing higher-level tasks. We want to be able say that these, for example, three code blocks relate to fitting a classifier. The inputs are the data and the output is a a classifier function and residuals, say. If one wanted to try a different statistical method one would add a parallel task which would have the same inputs (or a superset) and produce a classifier function.

Documentation

preliminary overview R function documentation
R function documentation

Duncan Temple Lang <duncan@wald.ucdavis.edu>

Last modified: Mon Mar 30 11:56:49 PDT 2009