*Note: I feel like I’ve read a similar article recently, but couldn’t track it down, so I figured I’d dig around and see what I could work out.*

Some time last year I wanted to generate a bunch of different linear models against a dataset. No problem – I created a list of the formulae, and used `lapply`

to create the models. Something like this:

1 2 3 4 5 6 7 8 9 |
model_formulae <- list( 'Y ~ X1', 'Y ~ I(X2*X2)', 'Y ~ X1 + I(X2*X2)' ) trained_models <- lapply(model_formulae, function(formula, data) { lm(formula, data) }, data) |

So far, so good. But when I looked at the results:

1 |
trained_models[[1]] |

1 2 3 4 5 6 7 |
## ## Call: ## lm(formula = formula, data = data) ## ## Coefficients: ## (Intercept) X1 ## -108.48 89.19 |

The `call`

property didn’t give me the details of which formula was used for that model. Sure, I could work it out from the indices of `model_formulae`

and `trained_models`

. I could even name the elements of `model_formulae`

to make it easier. But naming things is hard, and frankly I feel like I shouldn’t have to.

I tried a few things, such as the answers to this Stack Overflow question. While the question wasn’t about my specific need, it seemed close enough that one of the answers might help. Sadly, none of them did.

I was starting to worry that I’d have to fall back on `eval(parse(...))`

, a construct that’s a pet hate of mine (and I’m not alone): it’s harder to debug than normal code, and an IDE can’t do much to help find errors. Every time I’ve seen it (thankfully not often), there’s been a better way to code the same logic without it. So I wanted to avoid it if possible.

A bit more digging led to this discussion, which held the solution – using `do.call()`

:

1 2 3 4 5 |
trained_models <- lapply(model_formulae, function(model_formula, data) { do.call("lm", list(formula = as.formula(model_formula), data = quote(data))) }, data) |

It’s a bit more verbose*, and I concede less readable, than the original version. But the results are what I’m after:

1 |
trained_models[[1]] |

1 2 3 4 5 6 7 |
## ## Call: ## lm(formula = Y ~ X1, data = data) ## ## Coefficients: ## (Intercept) X1 ## -108.48 89.19 |

so I’ll use it next time I hit this problem.

* The `as.formula()`

call isn’t strictly necessary, but I prefer the output to show the formula as a formula, not a string.