Thursday, January 21, 2016

Unit testing in R

Well, it didn't take too much research to come up with some unit test tools for R. They're clunky compared to Visual Studio, but not terribly so. The one that seems to hold the most promise is a package called testthat. Here's quick summary of how it works:

Create the test file(s). Here's a test file for a function Im that creates an identity matrix of dimension d:

expect_null(Im(0), 'Dimension 0')
expect_equivalent(Im(1), matrix(1, 1), 'Dimension 1')
expect_equivalent(Im(2), matrix(c(1, 0, 0, 1), 2), 'Dimension 2')
expect_equivalent(Im(3), matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), 3), 'Dimension 3')

The test files get put in a test directory and can be run individually or as a group using a test script:

library('testthat')
source('Im.R')
test_dir('tests', report = 'Summary')
Now we can get to writing code (in file Im.R):

First try:
Im = function(d) NULL
First passes, others fail. Add trivial case:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result = matrix(1, d, d)
  return(result)
}
First two pass. Now handle non-trivial case:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result<-matrix(0, d, d)
  for (i in 1:d)
    result[d,d] = 1
  return(result)
}
Oops, 2&3 still fail because I goofed the index:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result = matrix(0, d, d)
  for (i in 1:d)
    result[i,i] = 1
  return(result)
}
and we're in business; all pass. Now we can confidently refactor. Wouldn't it be simpler to just pass the ones and zeros to the matrix data source rather than making it all zero and then overwriting? How about this:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result = matrix(c(1, rep(0,d)), d, d)
  return(result)
}
Tests pass, but we get a warning message because the data source isn't a multiple of the row or column length. We could patch that:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result = suppressWarnings(matrix(c(1, rep(0,d)), d, d))
  return(result)
}
That passes. Or, we could just generate the entire data source with the right length. There are d-1 sequences of one followed by d zeros and then a trailing one.
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  result = matrix(
    c(rep(
      c(1, rep(0,d)),d),
      1),
    d, d)
  return(result)
}
 And, of course, there's no need to store the intermediate result at this point:
Im = function(d)
{
  if (d <= 0)
    return(NULL)

  return(
    matrix(
      c(rep(
        c(1, rep(0,d)),d-1),
        1),
      d, d))
}
All pass, good enough (though throwing a couple comments in there would probably be in order).

No comments:

Post a Comment