The beauty of the R-ecosystem is we do not have to build everything from scratch ourselves. We can leverage the work of others to use their datasets or their functions. These datasets and functions are distributed in packages
- usually a collection of related functions and/or datasets. Just like R and RStudio, packages are made freely available.
If interested about how and why software is free, check out the wikipedia page on free and open-source software (FOSS): https://en.wikipedia.org/wiki/Free_and_open-source_software.
For example, let’s say we wanted R to make a sound. With some googling, you might discover the beep()
function exists in the beepr
package. Note, that if you type beep()
<ENTER>
into the console without installing the beepr
package, you will get the following error message:
Figure 5.1: Installing packages is analogous to buying a toolbox of power tools. You only have to buy the toolbox once, then you can use any of its tools by taking the toolbox out. Likewise, you only have to install a package once on your computer; after that, you will use the library() function to take it out whenever you want it.
Error in beep() : could not find function “beep”
Go ahead and try it - please do not fear error messages.
Okay, so we need to get the package. Here is a code alternative to using RStudio’s user interface to get a package:
At this point, if you type beep()
<ENTER>
into the console, you will still get the same error message. This might seem strange, but let me introduce an analogy that might help.
As depicted in Figure 5.1, installing packages is analogous to buying a toolbox filled with tools. Note that buying a toolbox is not the same as using the tools in the toolbox. Once you’ve bought the toolbox, to use a tool inside of it, you first retrieve the toolbox from your basement/shed/garage/etc. Similarly, in R, install.packages()
gets you a toolbox of functions that you now own - you only need to do this once per computer. To use a function, retrieve your toolbox first using library(packageName)
; this makes the package’s functions available to use during your current session - the library()
command will need to be rerun anytime you restart R.
Here is an example of this workflow where the we use two commands:
library(beepr)
: Take out the beepr
toolbox (which we acquired earlier)beep()
: Make a sound (assuming your computer volume is audible)The below code executes these two commands:
# run the library() command with every R session
# where ythe beep() function will be used
library(beepr) # take out the toolbox
beep() # use the tool you want
Now, if you are not at work, a physical library, or other quiet place, just have some fun trying these (note, you only use the library
function for a specific package once per session.):
You might think this beepr
is a strange, useless
package. However, sometimes you will run code that takes a few seconds,
minutes, or even days. It helps to play a noise to alert yourself when
the script has finished.
Two of the packages we will rely on throughout this book are the tidyverse
package and the causact
package. The tidyverse
package is actually a collection of packages that includes packages that we will use like dplyr
for data manipulation and ggplot2
for data visualization. The causact
package will provide access to datasets that are used in this text, and more importantly, enable us to investigate our models of business processes, issues, and decisions.
The list of packages that get installed as part of the tidyverse
include
ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, readxl,
and lubridate
. See here for more info:
https://www.tidyverse.org/packages/.
To get these packages, execute the following lines from within RStudio (put your cursor in the console to answer any prompts during installation):
The first line installs the tidyverse
collection of packages. The second line installs the causact
package. When moving to a new project or task within R, make sure you restart your R-session (Session -> Restart R
) to unload packages and erase the objects in your environment. Otherwise, you might be getting results using functions or data from a previous project which you have now forgot about.
Occassionally, you will want to install the development version of
causact
because it will have a bug fix or a new feature
that you need. To do so, first run
install.packages(remotes)
and
library(remotes)
. Then run
install_github(“flyaflya/causact”)
to download a more
up-to-date version of the package that is not available via the standard
R package repository known as CRAN. For most use cases, the CRAN version
of a package should be used first as it has been more throughly
tested.
More information on R packages can be found at DataCamp’s “R Packages: A Beginner’s Guide”: https://www.datacamp.com/community/tutorials/r-packages-guide.