July 17, 2012

Trends in run scoring, NL edition (more R)

Last time around I used R to plot the average runs per game for the American League, starting in 1901. Now I’ll do the same for the National League.  I'll save a comparison of the two leagues for my next post.

A fundamental principal of programming is that code can be repurposed for different sets of datas. So much of what I’m going to describe recycles the R code I used for the AL exercise.

So starting with the preliminary step, I went back to Baseball Reference for the data, followed up by the same sort of finessing described for the AL. Once the data was read into the R workspace, I simply copies the AL code, and changed the variable names to create new objects and variables.  (I could have simply rerun the same code, but I wanted to have both the AL and NL data and trend lines available for comparison.)  This included creating new LOESS trend lines.




The first thing to do is create a new object with the LOESS model, and a secondary one with the predicted values calculated by the model function.  Here’s the code for the default LOESS line, and the plot.

# create new object RunScore.LO for loess model
NLRunScore.LO <- loess(NLseason$R ~ NLseason$Year)               
NLRunScore.LO.predict <- predict(NLRunScore.LO)
#
# plot the data, add loess curve
ylim <- c(3,6)
plot(NLseason$R ~ NLseason$Year,
  pch=2, col="blue",
  ylim = ylim,
  main = "National League: runs per team per game, 1901-2012",
  xlab = "year", ylab = "runs per game")
# loess predicted value line
  lines(NLseason$Year, NLRunScore.LO.predict, lty="solid", col="blue", lwd=2)   
# chart tidying
  grid()
#

When it came to plotting the data points, I started to make some changes.  I will want to differentiate the NL from the AL when I get around to comparing the two, so the trend line is blue and rather than circles, the points are plotted as triangles.
- The pch=2 will plot a triangle, rather than the default circle. The complete range of types can be found on this page.
- The colour of the points and the line are defined using the col parameter. It’s possible to use RGB schemes, etc. There’s more information on that in the “Color specification” section of this page.
- The LOESS line is added through the lines function, using the (x, y) format described earlier.
- The lty, col, and lwd define the line type, colour, and weight.

(click to enlarge)


I also created multiple LOESS trend lines, adjusting the span control as I did with the AL data.  Here’s the equivalent code and output:

# plot the data, add loess curve
ylim <- c(3,6)
plot(NLseason$R ~ NLseason$Year,
  pch=2, col="black",
  ylim = ylim,
  main = "Natinoal League: runs per team per game, 1901-2012",
  xlab = "year", ylab = "runs per game")
# loess predicted value line
  lines(NLseason$Year, NLRunScore.LO.predict, lty="solid", col="blue", lwd=2)
  lines(NLseason$Year, NLRunScore.LO.25.predict, lty="dashed", col="red", lwd=2)
  lines(NLseason$Year, NLRunScore.LO.5.predict, lty="dotdash", col="black", lwd=2)
# chart tidying
  legend(1980, 3.5,
    c("default", "span=0.25", "span=0.50"),
    lty=c("solid", "dashed", "dotdash"),
    col=c("blue", "red", "black"),
    lwd=c(2, 2, 2))
  grid()
#



Next time: comparing the two leagues.


-30-

No comments:

Post a Comment