R and Data Analysis: Open-Access Articles

by Janet Salmons, Ph.D., Research Community Manager for Sage Methodspace


Want to use R for statistical computing? These open-access resources might help!

From the R Foundation

The nonprofit R foundation provides support for the free R software. The R Foundation offers a blog and an open-access peer reviewed journal.

From Sage Journals

Bradley A, James RJE. Web Scraping Using R. Advances in Methods and Practices in Psychological Science. 2019;2(3):264-270. doi:10.1177/2515245919859535
Abstract. The ubiquitous use of the Internet in daily life means that there are now large reservoirs of data that can provide fresh insights into human behavior. One of the key barriers preventing more researchers from utilizing online data is that they do not have the skills to access the data. This Tutorial addresses this gap by providing a practical guide to scraping online data using the popular statistical language R. Web scraping is the process of automatically collecting information from websites. Such information can take the form of numbers, text, images, or videos. This Tutorial shows readers how to download web pages, extract information from those pages, store the extracted information, and do so across multiple pages of a website. A website has been created to assist readers in learning how to web-scrape. This website contains a series of examples that illustrate how to scrape a single web page and how to scrape multiple web pages. The examples are accompanied by videos describing the processes involved and by exercises to help readers increase their knowledge and practice their skills. Example R scripts have been made available at the Open Science Framework.

Brown VA. An Introduction to Linear Mixed-Effects Modeling in R. Advances in Methods and Practices in Psychological Science. 2021;4(1). doi:10.1177/2515245920960351
Abstract. This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical introduction to how to implement mixed-effects models in R. The intended audience is researchers who have some basic statistical knowledge, but little or no experience implementing mixed-effects models in R using their own data. In an attempt to increase the accessibility of this Tutorial, I deliberately avoid using mathematical terminology beyond what a student would learn in a standard graduate-level statistics course, but I reference articles and textbooks that provide more detail for interested readers. This Tutorial includes snippets of R code throughout; the data and R script used to build the models described in the text are available via OSF at https://osf.io/v6qag/, so readers can follow along if they wish. The goal of this practical introduction is to provide researchers with the tools they need to begin implementing mixed-effects models in their own research.

Corlatti L. Regression Models, Fantastic Beasts, and Where to Find Them: A Simple Tutorial for Ecologists Using R. Bioinformatics and Biology Insights. 2021;15. doi:10.1177/11779322211051522
Abstract. Regression modeling is a workhorse of statistical ecology that allows to find relationships between a response variable and a set of explanatory variables. Despite being one of the fundamental statistical ideas in ecological curricula, regression modeling can be complex and subtle. This paper is intended as an applied protocol to help students understand the data, select the most appropriate models, verify assumptions, and interpret the output. Basic ecological questions are tackled using data from a fictional series, “Fantastic beasts and where to find them,” with the aim to show how statistical thinking can foster curiosity, creativity and imagination in ecology, from the formulation of hypotheses to the interpretation of results.

Cousineau D, Goulet M-A, Harding B. Summary Plots With Adjusted Error Bars: The superb Framework With an Implementation in R. Advances in Methods and Practices in Psychological Science. 2021;4(3). doi:10.1177/25152459211035109
Abstract. Plotting the data of an experiment allows researchers to illustrate the main results of a study, show effect sizes, compare conditions, and guide interpretations. To achieve all this, it is necessary to show point estimates of the results and their precision using error bars. Often, and potentially unbeknownst to them, researchers use a type of error bars—the confidence intervals—that convey limited information. For instance, confidence intervals do not allow comparing results (a) between groups, (b) between repeated measures, (c) when participants are sampled in clusters, and (d) when the population size is finite. The use of such stand-alone error bars can lead to discrepancies between the plot’s display and the conclusions derived from statistical tests. To overcome this problem, we propose to generalize the precision of the results (the confidence intervals) by adjusting them so that they take into account the experimental design and the sampling methodology. Unfortunately, most software dedicated to statistical analyses do not offer options to adjust error bars. As a solution, we developed an open-access, open-source library for R—superb—that allows users to create summary plots with easily adjusted error bars.

Nordmann E, McAleer P, Toivo W, Paterson H, DeBruine LM. Data Visualization Using R for Researchers Who Do Not Use R. Advances in Methods and Practices in Psychological Science. 2022;5(2). doi:10.1177/25152459221074654
Abstract. In addition to benefiting reproducibility and transparency, one of the advantages of using R is that researchers have a much larger range of fully customizable data visualizations options than are typically available in point-and-click software because of the open-source nature of R. These visualization options not only look attractive but also can increase transparency about the distribution of the underlying data rather than relying on commonly used visualizations of aggregations, such as bar charts of means. In this tutorial, we provide a practical introduction to data visualization using R specifically aimed at researchers who have little to no prior experience of using R. First, we detail the rationale for using R for data visualization and introduce the “grammar of graphics” that underlies data visualization using the ggplot package. The tutorial then walks the reader through how to replicate plots that are commonly available in point-and-click software, such as histograms and box plots, and shows how the code for these “basic” plots can be easily extended to less commonly available options, such as violin box plots. The data set and code used in this tutorial and an interactive version with activity solutions, additional resources, and advanced plotting options are available at https://osf.io/bj83f/.

Rousselet GA, Pernet CR, Wilcox RR. The Percentile Bootstrap: A Primer With Step-by-Step Instructions in R. Advances in Methods and Practices in Psychological Science. 2021;4(1). doi:10.1177/2515245920911881
Abstract. The percentile bootstrap is the Swiss Army knife of statistics: It is a nonparametric method based on data-driven simulations. It can be applied to many statistical problems, as a substitute to standard parametric approaches, or in situations for which parametric methods do not exist. In this Tutorial, we cover R code to implement the percentile bootstrap to make inferences about central tendency (e.g., means and trimmed means) and spread in a one-sample example and in an example comparing two independent groups. For each example, we explain how to derive a bootstrap distribution and how to get a confidence interval and a p value from that distribution. We also demonstrate how to run a simulation to assess the behavior of the bootstrap. For some purposes, such as making inferences about the mean, the bootstrap performs poorly. But for other purposes, it is the only known method that works well over a broad range of situations. More broadly, combining the percentile bootstrap with robust estimators (i.e., estimators that are not overly sensitive to outliers) can help users gain a deeper understanding of their data than they would using conventional methods.

Song QC, Tang C, Wee S. Making Sense of Model Generalizability: A Tutorial on Cross-Validation in R and Shiny. Advances in Methods and Practices in Psychological Science. 2021;4(1). doi:10.1177/2515245920947067
Abstract. Model generalizability describes how well the findings from a sample are applicable to other samples in the population. In this Tutorial, we explain model generalizability through the statistical concept of model overfitting and its outcome (i.e., validity shrinkage in new samples), and we use a Shiny app to simulate and visualize how model generalizability is influenced by three factors: model complexity, sample size, and effect size. We then discuss cross-validation as an approach for evaluating model generalizability and provide guidelines for implementing this approach. To help researchers understand how to apply cross-validation to their own research, we walk through an example, accompanied by step-by-step illustrations in R. This Tutorial is expected to help readers develop the basic knowledge and skills to use cross-validation to evaluate model generalizability in their research and practice.

Learn more in this post, and find Sage books on the topic.


More Methodspace Posts about Data Science and Data Analysis

Previous
Previous

Updates Since 2021 - SICSS-Howard/Mathematica Alumni Receive Inaugural Excellence in Computational Social Science Research Awards

Next
Next

Let’s Discuss Intellectual and Academic Freedom, Part 2