Thursday, July 23, 2015

Autocorrupt in R

You know that "autocomplete" feature on your smart phone or tablet that occasionally (or, in my case, frequently) turns into an "autocorrupt" feature? I just ran into it in an R script.

I wrote a web-based application for a colleague that lets students upload data, run a regression, ponder various outputs and, if they wish, export (download) selected results. In the server script, I created an empty list named "export". As users generated various outputs, they would be added to the list for possible download (to avoid having to regenerate them at download time). For instance, if the user generated a histogram of the residuals, then the plot would be stored in export$hist. Similarly, if the user looked at the adjusted R-squared, it would be parked in export$adjr2.

All was well until, in beta testing, I bumped into a bug involving the p-value for the F test of overall fit (you know, the test where failure to reject the null hypothesis would signal that your model contended for the worst regression model in the history of statistics). Rather than getting a single number between 0 and 1, in one test it printed out as a vector of numbers well outside that range. Huh???

I beat my head against an assortment of flat surfaces before I found the bug. The following chunk of demonstration code sums it up.
export <- list()               # create an empty export list
print(export$f)                # result: NULL
export$fitted <- c(2, 3, 1, 7) # (simulated) fitted values
print(export$f)                # result: [1] 2 3 1 7
Created by Pretty R at inside-R.org

The intent was to store the p-value of the test of overall fit in export$f, and the fitted values in export$fitted. If the user never checked the F test, I wanted export$f to be null, which would signal the export subroutine to skip it. Instead, the export subroutine autocompleted export$f (which did not exist) to export$fitted (which did exist) and spat out the mystery vector. There are multiple ways to avoid the bug, the simplest being to rename export$f to something like export$fprob, where "fprob" is not a substring of the name of any other entry of export.

I do my R coding inside RStudio, which provides autocompletion suggestions. Somewhere along the line, I think I came across the fact that the R interpreter autocompletes some things. It never occurred to me that this would happen when a script ran. When running commands interactively, I suppose the autocomplete feature saves some keystrokes. That's not generally an issue when running scripts, so I don't know why autocomplete is not turned off when "sourcing" a script.

At any rate, letting the betting commence on how long it will take me to forget this (and trip over it again).

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.