Speakers

2021 Birds of a Feather Session


2021 Keynotes

2021 Lightning Talks

2021 Pre-Recorded Talks

Digital tracks will be released June 4


2021 Speakers

2021 Workshops

Register for workshops here.

2022 Keynotes

2022 Lightning Talks

2022 Speakers

C. Nathalie Yuen

Come TogetheR, Right Now, OveR R

Inspired by the musical contributions of the Pacific Northwest, the focus of this 5-minute lightning talk is the Top 100 Billboard charts. In addition to using the Billboard charts to learn about R/RStudio, this talk will also discuss using Tidy Tuesday as a resource, developing interdisciplinary skills, and forging relationships within collaborative groups. The “Top 100 Billboard” is a Tidy Tuesday (Mock, 2022) activity that includes song, artist, and chart information from the Billboard Chart, as well as song audio information from Spotify. Although this could be used across a variety of situations, from an introduction to R/RStudio to settling arguments in social situations, this talk will focus on use in an undergraduate classroom. The talk will include a description of an in-class activity and general reflections on use of R/RStudio in the classroom. Music journalist and author, Rob Sheffield (2010) wrote, “Bringing people together is what music has always done best” (p. 12) but this talk will suggest that, “Bringing people togetheR is what R has always done best.”



Nathalie Yuen headshot
Pronouns: she/her
Olympia, WA, USA
Dr. C. Nathalie Yuen is a member of the faculty at The Evergreen State College in Olympia, WA. She earned her Ph.D. in Psychology at the University of Nebraska at Omaha. Dr. Yuen primarily uses R for data visualization and in-class activities.

Cari Gostic

RShiny, Big Data and AWS: A tidy solution using Arrow

The Arrow package facilitates a low-effort, inexpensive transition from a local to cloud-based RShiny infrastructure. It is a relatively new and underutilized tool that requires no additional software licensing, integrates seamlessly with the Tidyverse, and leverages the analytic- and memory-efficient data formats offered by Apache (e.g. Parquet and Feather). In collaboration with the U.S. Environmental Protection Agency, my team built a dashboard to visualize nationwide hourly air quality data from 2010 through the present. Currently exceeding 34 million rows, this dataset expands further each week as recent data is uploaded. The initialization time for this app using a standard RShiny setup where all data is uploaded in an .RData file exceeds two minutes with additional loading for data-intensive visualizations within the app. In this talk, I will demonstrate how we improved dashboard loading times to seconds using an AWS S3 bucket, three functions from the Arrow package, and fewer than 20 new lines of code throughout our entire workflow.



Cari Gostic headshot
Pronouns: she/her
Seattle, WA, USA
Cari joined Sonoma Technology’s Data Science Department in 2020. In addition to her analytical experience in catastrophe modeling for the insurance industry, she has extensive experience in data processing and analysis, model development, and effective data visualization. She is currently involved in a variety of projects, including dashboard development, exceptional event analyses, and refinery monitoring. Cari earned her BS in Atmospheric Science from Cornell University and her MS in Data Science from the University of British Columbia.

Colleen O'Briant

Teaching Programming with Tidyverse Koans: A Journey of Successes and Failures

This talk is about my successes and failures using koans as a pedagogical tool for teaching programming using the tidyverse.

My Economics PhD began in a more or less standard way: we spent a harrowing first year learning about things like Lagrangian multipliers, hyperplane separation, and Bellman equations. Then, in the spring quarter, we were asked to teach ourselves to program in R and Julia (at the same time, of course). I developed a severe and debilitating but thankfully transitory mental block around writing for loops, yet somehow I was selected to teach R programming labs for the next PhD cohort. Perhaps as a way to process my feelings about that first year, I dove into trying to make teaching materials that didn't feel so scary, isolating, and frustrating.

That project developed into a sort of raison d'être for me through the PhD program. I collected advice from people who know much more about teaching programming than me, and I kept iterating on the materials. Now, as my PhD comes to a close, I've taught R seven different times in seven different econometrics courses, and I think my methods are finally worth sharing. (As an aside: I still don't know the first thing about Julia).

To summarize my vision statement: If we want to teach programming in a more inclusive way, what I've discovered is that the tidyverse is a great place to start, but using tidyverse koans can be even better.

What are koans?
Koans are short programming exercises that show students fundamentals, expose them to what's possible, and challenge them to apply what they've learned and form new connections. Ruby koans popularized the concept, and now there are Lisp koans, Python koans, Clojure koans, and many more, including my tidyverse koans. Something unique about koans is that there are built-in tests which students can run at any point to verify they're on the right track. It also introduces “test-driven development” to students as a fundamental building block.

How are koans similar and different from LearnR?
Both koans and LearnR are simple to build yourself, but koans are meant to be used in something like RStudio, not in the browser. With koans, there are no training wheels.

What are koan shortcomings?
There's an impulse when writing koans to keep ramping up the difficulty, but it's important to fight that impulse or else students will lose confidence. A koan can also only teach so much at a time, which is why just this month I’ve started to test koan read-alongs in both video and zine formats.



Colleen O'Briant headshot
Pronouns: she/her
Eugene, OR, USA
Colleen O'Briant is an Economics PhD student at the University of Oregon and anticipates graduating in June 2024. Her research is on building the econometrics of AI/ML tools, with the goal of enhancing trust and transparency in this rapidly evolving field. She will be on the job market for the 2023/2024 academic year.

David Keyes

How to Convince Your Teammates to Learn R

If you're attending an R conference Saturday in the middle of summer, I probably don't need to convince you that R is great. If you, like me, love R, it can be tempting to try to get everyone you know to use it. It's painful to watch people struggle to do basic things in other tools that you know can be done easily in R. It's especially painful if you work in an organization where you're the only R user. If you could just get others to learn R, you think, imagine all of the things you could accomplish.

How do you convince people to learn R? In running R for the Rest of Us for the last three and a half years, I've thought a lot about this question. In this talk, I'll share some of the lessons I've learned for convincing others to learn R. Things like:

  1. Strategies for making R feel less intimidating for newcomers.
  2. Starting with the end products that people can produce with R rather than the technical steps required to get there.
  3. Teaching people what they need to know (and no more) so they can more easily get started with R.

Despite our best intentions, it can be easy for more advanced R users to overwhelm newcomers with the myriad things R can do. If you want others to take up R, it's important to put yourself in the mindset of other people. This talk with show how to do that and, hopefully, help you convince others to join you in using R.



David Keyes headshot
Pronouns: he/him
Portland, OR, USA
David Keyes is the CEO and founder of R for the Rest of Us. Through online courses and trainings for organizations, R for the Rest of Us helps people learn to use R. In addition to its education work, R for the Rest of Us does consulting, developing reports, websites, and more to help organizations use R to improve their workflow, and much more.

Deepsha Menghani

Learning to create Shiny modules by turning an existing app modular

Shiny is an extremely powerful tool to create interactive web applications. However the code for shiny application can get really long and complex very quickly. Modules are a great way to organize the applications for better readability and code reusability. This talk will delve into how you can learn the concept of modules by breaking down an existing app structure into various components and turning them into modules one step at a time. Attendees will learn the fundamentals of module creation, implementation, and communication between modules.



Deepsha Menghani headshot
Pronouns: she/her
Seattle, WA, USA
Deepsha Menghani is a data science manager at Microsoft. Her work focuses on investment impact analysis and propensity modeling. When she is not geeking out over data, she is knitting or collecting yarn.

Deepsha Menghani

Harnessing the power of Gen AI with RAGs

In the rapidly evolving field of AI, it’s pivotal to stay at the forefront of the latest advancements. Let’s uncover how Gen AI and Retrieval-Augmented Generation techniques enhance the quality and relevance of AI-generated content, making it more accurate and contextually aware of the business needs.

Deepsha Menghani headshot
Pronouns: she/her
Seattle, WA, USA
Deepsha Menghani is a Data Science and AI Manager at Microsoft, where she harnesses the transformative power of Data Science in partnership with marketing and customer support. She applies her deep expertise to shape campaign strategies and enhance customer engagement. Beyond her technical acumen, Deepsha champions a culture of diversity, equity, and inclusion, mentoring a team of talented data scientists to achieve strategic objectives and foster innovation.

Dror Berel

Tidy everything… How I finally got to dive in Time series, Tree and Graph/Network data structures and analysis, thanks to their tidy packages

For years I was trying to learn and use R data structures such as xts for time series, dendograms for trees, and graphs from the igraph package. Perhaps what made it difficult and less intuitive was that there was always some piece of the data structure hidden in the class, or not printed in the default abstraction of the object/class and its projections. This was finally clearly visible with the tidy approach, that defines a tidy tabular structures for the different components, and enforce a cohesive system around it to ensure the more complex stuff is properly handled behind the scenes. In this talk I will review some examples: the tsibble object from the tidyverts ecosystem, the treedata object from the tidytree ecosystem, and the tbl_graph object from the tidytgraph package. I will also demonstrate how I leveraged tibble’s nested structure to embed S4 objects into columns, and systematically operate on them with a purrr (row-wise) manner.



Dror Berel headshot
Pronouns: he/him
Seattle, WA, USA
Dror Berel is a statistician with over 20 years of work experience in both academia and industry. He loves using R for (almost) everything. One time he even drew a heart with his spouse's name for Valentine's day, using R of course. He works as a consultant, solving business problems and scale analytical tools for diverse data domains, leveraging both traditional Machine learning and Causal Inference along with modern approaches.

Ed Borasky

Eikosany: Microtonal Algorithmic Composition with R

Eikosany is an R package for composing microtonal electronic music based on the theories of Erv Wilson. It gives a composer the ability to

  • create compositions using a wide variety of microtonal scales,
  • manipulate the scores as R data.table objects,
  • synthesize the compositions as audio files, and
  • export the compositions as MIDI files to digital audio workstations.

In this talk I'll briefly describe the music theory behind Eikosany and walk through a typical composition scenario. At the end, I'll play the resulting composition.



Ed Borasky headshot
Pronouns: he/him
Beaverton, OR, USA
M. Edward (Ed) Borasky is a retired scientific applications and operating systems programmer who has been using R since version 0.90.1 on Red Hat Linux 6.2.* Before R there was Fortran and assembler - lots of different assemblers. (Floating Point Systems AP-120B microcode, even.)

Besides his main professional use for R, Linux performance analysis and capacity planning, Ed has used R for computational finance, fantasy basketball analytics, and now, algorithmic composition. His music is best defined as experimental, combining algorithmic composition, microtonal scales, and spectral sound design.

Intermediate Quarto: Parameterized Reports Workshop

Friday June 21, 2024

1:30 - 4:30 PM

Room C123A

The Intro Quarto workshop takes you through the basics of authoring a reproducible report using Quarto. This workshop builds on those concepts and teaches you how to level up your reproducible reports by using parameters, conditional content, conditional code execution, and custom styling sheets for HTML and Microsoft Word formats. Additionally, you will learn how to render all variations of a parameterized report at once using quarto and purrr.

Knowledge Prerequisites: The workshop is designed for those with some experience in R and R Markdown or Quarto. It will be assumed that participants can perform basic data manipulation and visualization. Experience with the tidyverse, especially purrr and the pipe operator, is a major plus, but is not required.

Pre-Installations: Recent version of R, RStudio, and Quarto CLI. Packages used in exercises include dplyr, fs, ggplot2, here, janitor, knitr, lubridate, plotly, purrr, quarto, readr, rmarkdown, stringr, and tidyr.

install.packages(c("dplyr", "fs", "ggplot2", "here", "janitor", "knitr", 
                   "lubridate", "plotly", "purrr", "quarto", "readr", 
                   "rmarkdown", "stringr", "tidyr"))

Instructor

Jadey Ryan headshot

Jadey Ryan

Pronouns: She/her/hers

Location: Tacoma, Washington

Jadey Ryan is a self-taught R enthusiast working in environmental data science in the Natural Resources and Agricultural Sciences section of the Washington State Department of Agriculture. She is obsessed with cats, nature, R, and Quarto.

Learn more at jadeyryan.com.

Intermediate Shiny: How to Draw the Owl Workshop

Friday June 21, 2024

9:00 AM - 12:00 PM

Room C123B

Build on your beginning shiny skills and learn more about the confusing parts of shiny, and the surrounding shiny ecosystem. By the end of this workshop, you will be able to:

  • Dynamically update controls based on other inputs

  • Explain when to use eventReactive versus observeEvent in your code

  • Use Quarto Dashboards with Shiny

  • Integrate ObservableJS visualizations into your Shiny Applications

  • Explain the deployment process to Shinyapps.io and Posit Connect

Knowledge Prerequisites: Basic knowledge of shiny apps. If you know how to build this app - Single File Shiny App, you should be good to go.

Pre-Installations: We will use Posit.Cloud for this workshop, so no installations needed.

Instructor

Ted Laderas headshot

Ted Laderas

Pronouns: He/him/his

Location: Portland, Oregon

Ted Laderas is a trainer, instructor, and community builder. He currently works at the Fred Hutch Cancer Center managing the data science community. He love Shiny, but acknowledges there are some confusing parts.

Introduction to GIS and mapping in R Workshop

Friday June 21, 2024

1:30 - 4:30 PM

Room C123B

The usage of R in GIS is growing because of its enhanced capabilities for statistics, data visualization, and spatial analytics. In this workshop, you will learn some basics of working with geospatial data and producing maps in R. Topics will include using sf and terra to work with vector and raster data, respectively. You will practice visualizing geospatial data using base plotting functions, ggplot2, and leaflet.

Knowledge Prerequisites: Though not required, it would be beneficial to know some basics of using dplyr and ggplot2.

Pre-Installations: dplyr, ggplot2, patchwork, viridis, knitr, terra, sf, leaflet, usaboundaries, and httr

install.packages(c("dplyr","ggplot2","patchwork","viridis","knitr",
                   "terra","sf","leaflet","httr"),
                   Ncpus = 3)

install.packages("remotes") remotes::install_github("ropensci/USAboundaries") remotes::install_github("ropensci/USAboundariesData")

Instructors

Brittany Barker headshot

Brittany Barker

Pronouns: She/her/hers

Location: Portland, Oregon

Brittany Barker is an Assistant Professor (Senior Research) at the Oregon IPM Center at Oregon State University. She uses R to develop ecological models that can provide decision-support for managing and monitoring pests, their crop hosts, and their natural enemies. Over the past five years, she has transitioned from ArcGIS to R for nearly all GIS and mapping operations. She loves nature, running, native plants, wildlife, and sci-fi and horror books.


Roger Andre headshot

Roger Andre

Pronouns: He/him/his

Location: Seattle, Washington

Roger is Sr. Business Analysis Manager at T-Mobile. He has used R for location based analyses of retail store locations and for reporting and dashboard generation (and a whole lot of data wrangling). His background is in code-first spatial data analysis and engineering. When not on a computer, he enjoys fly-fishing and reading.

Introduction to Quarto Workshop

Friday June 21, 2024

9:00 AM - 12:00 PM

Room C123A

Quarto is a publishing system for weaving together code and narrative to create fully reproducible documents, presentations, websites, and more. In this workshop, you’ll learn what you need to start authoring Quarto documents in RStudio. You do not need any prior experience with R Markdown, but if you have some, you’ll also get a few tips for transitioning to Quarto.

Knowledge Prerequisites: You should be comfortable opening, editing and navigating files in RStudio. You should have some experience with the R language, but no specific experience in any packages is required.

Pre-Installations: Recent version of R, RStudio, and Quarto CLI. R packages: tidyverse, gt, palmerpenguins, quarto. Detailed instructions provided prior to the workshop.

install.packages(c("tidyverse", "gt", "palmerpenguins", "quarto"))

Instructor

Charlotte Wickham headshot

Charlotte Wickham

Pronouns: She/her/hers

Location: Corvallis, Oregon

Charlotte Wickham is a Developer Educator at Posit with a focus on Quarto. Before Posit, she taught Statistics and Data Science at Oregon State University.

Isabella Velásquez

The medium is the message: R programmers as content creators

Isabella Velásquez
Pronouns: she/her
Seattle, WA, USA
Isabella is an R enthusiast, first learning the programming language during her MSc in Analytics. Previously, Isabella conducted data analysis and research, developed infrastructure to support use of data, and created resources and trainings. Her work on the Posit (formerly RStudio) Marketing team draws on these experiences to create content that supports and strengthens data science teams. In her spare time, Isabella enjoys playing with her tortoiseshell cat, watching film analysis videos, and hiking in the mountains around Seattle. Find her on Twitter and Mastodon: @ivelasq3

Jadey Ryan

Using Shiny to optimize the climate benefits of a statewide agricultural grant program

Washington’s Sustainable Farms and Fields program provides grants to growers to increase soil carbon or reduce greenhouse gas (GHG) emissions on their farms. To optimize the climate benefits of the program, we developed the Washington Climate Smart Estimator {WaCSE} using R and Shiny.

Integrating national climate models and datasets, this intuitive, regionally specific user interface allows farmers and policymakers to compare the climate benefits of different agricultural practices across Washington’s diverse counties and farm sizes. Users can explore GHG estimates in interactive tables and plots, download results in spreadsheets and figures, and generate PDF reports. In this talk, we present the development process of {WaCSE} and discuss the lessons we learned from creating our first ever Shiny app.



Jadey Ryan headshot
Pronouns: she/her
Seattle, WA, USA
Jadey Ryan works for the Washington State Department of Agriculture in the Natural Resources Assessment Section. She supports the Washington Soil Health Initiative and Sustainable Farms and Fields programs by collecting and processing soil and climate data, managing the soil health database, and developing tools to visualize and analyze the data. These data products contribute sound science to inform decision making that balances healthy land with sustained ecosystem functions with a thriving agricultural economy. Jadey primarily uses R in her day-to-day and considers herself a self-taught intermediate user.

Justin Sherrill

Transit Access Analysis in R

Transit agencies across the country are facing a fiscal cliff that threatens their ability to provide vital services to cities and communities. Understanding the crucial role of these networks in creating livable cities is now more important than ever. This presentation offers an intermediate-level overview of R packages and workflows for analyzing public transit networks and assessing their connectivity to amenities such as jobs, schools, parks, and stores. It showcases how to report results and outlines the necessary data inputs for this analysis. Packages like {tidytransit} enable users to access transit schedule data in the General Transit Feed Specification (GTFS) format, allowing them to map stops, routes, and calculate service frequency. Going deeper, packages like {r5r} combine GTFS files with OpenStreetMap street network data to model origin-destination trips based on factors like time of day, walking speed, and transfer preferences. This presentation demonstrates that these packages, alongside other essential {tidyverse} tools, empower R users with powerful resources to delve into the realm of transit planning and modern urban analytics.



Justin Sherrill headshot
Pronouns: he/him
Portland, OR, USA
Justin Sherrill is a Technical Manager with regional planning & economics consulting firm ECONorthwest. His work focuses primarily on demographics, transport systems analysis, the socioeconomics of land use policies, and effective data visualization.

Prior to joining ECONorthwest, Justin worked at the Population Research Center at Portland State University, helping vet early results from the 2020 Census, and at King County Metro, where he supported the agency's Strategy & Performance team in tracking operational efficiency, prioritizing transit-related capital projects, and building interactive dashboards. Outside of his work at ECONorthwest, you can find published examples of Justin's maps and data visualizations in Proceedings of the National Academy of Sciences, and in “Upper Left Cities: A Cultural Atlas of San Francisco, Portland, and Seattle”.

Kangjie Zhang

Beyond the Comfort Zone: Traditional Statistical Programmers Embrace R to Expand their Toolkits

In the pharmaceutical industry, traditional statistical programmers have long relied on proprietary software to perform data analysis tasks. However, in recent years, there has been a growing interest in open-source tools like R, which offer a range of benefits including flexibility, reproducibility, and cost-effectiveness.

In this presentation, we will explore the ways in which statistical programmers in the pharmaceutical industry are embracing R to expand their toolkits and improve their workflows, including data visualization and the generation of Analysis Data Model (ADaM) datasets.

One key challenge in using R to generate ADaM is bridging the gap between open-source R packages (e.g., admiral, metacore, metatools, xportr from Pharmaverse) and the company's internal resources. We will discuss strategies for overcoming this challenge and how it can be integrated into a company's existing infrastructure, e.g., including the development of in-house R packages and provide internal template scripts/use cases.

Overall, this presentation will provide examples of how R can be used as a powerful complement to traditional statistical programming languages, such as SAS. By embracing R, statistical programmers can expand their toolkits, collaborate across the industry to tackle common issues, and most importantly, provide value to their organizations/industry.



Kangjie Zhang headshot
Pronouns: she/her
Vancouver, BC, Canada
Kangjie Zhang is a Lead Statistical Analyst at Bayer within Oncology Data Analytics team. She uses SAS and R for statistical analysis and reporting, supporting clinical trial studies and facilitating the transition from SAS to R for clinical submissions. With a passion for open-source projects, she has contributed to multiple R packages. Before joining the pharma industry, she worked as a Data Analyst at CHASR ([Canadian Hub for Applied and Social Research](https://chasr.usask.ca/index.php)) and the Saskatoon Police Station, utilizing R for data collection, manipulation, and predictive modeling.

Lovedeep Gondara

Using R Shiny for cancer surveillance, lessons from the trenches

At British Columbia Cancer Agency, we have embarked on moving all of our cancer surveillance reports to R Shiny dashboards (Example: https://bccandataanalytics.shinyapps.io/IncidenceCounts/). This talk will focus on the roadmap, why we decided to move to R shiny, the challenges we faced implementing it within a public healthcare system and the outcome. The talk will touch on the pros and cons of various approaches such as using package-based development (golem), data privacy, other add-ons needed for the apps to be functional surveillance dashboards, etc. We will end the talk with outlining further adaptation of R Shiny in the form of interactive nomograms for research studies.



Lovedeep Gondara headshot
Pronouns: he/him
Vancouver, BC, Canada
Lovedeep Gondara is a Research Scientist at Provincial Health Services Authority (PHSA) in British Columbia and has a PhD in computer science. His current job involves research and applications of deep learning in healthcare domain. In his past role, he was a statistician/data scientist at British Columbia Cancer Agency, PHSA, where he was involved in conceptualizing, design, and development of R shiny apps for cancer surveillance.

Mark Niemann-Ross

Use R to control a Raspberry Pi

The Raspberry Pi is a credit-card sized single board computer for less than $30. Most people think of it as an educational toy, but in reality it is a full-fledged linux computer with a full bank of data acquisition pins. The Raspberry Pi can read data from a multitude of sensors and control motors, cameras, and lights.

Most commonly, the Raspberry Pi is programmed in Python – but with a small amount of work, R can also be installed. Better yet, R can be used to read sensors and control output devices just like python.

In this fifteen minute talk, Mark Niemann-Ross will demonstrate the installation of R and show how to use it to blink lights, read sensors, and react to buttons. Participants will leave this talk with a clear path for use of the Raspberry Pi as a computing platform capable of data acquisition and processing with the R language.



Mark Niemann-Ross headshot
Pronouns: he/him
Portland, OR, USA
I write science fiction. Sometimes it’s about spaceships, sometimes it’s about products. The goal is the same; explain where we want to be, point out hazards, celebrate arrival. I live in Portland Oregon and teach R and Raspberry Pi for LinkedIn Learning.

Melissa Bather

Using R to Estimate Animal Population Density

Spatially explicit capture-recapture (SECR) models are used to estimate animal population densities within specified areas of space by detecting and then re-detecting animals at different points in time within the region of interest. They are important tools for conserving, monitoring, and managing animal species. There are a number of different animal detection methods used for these models, including trapping, tagging, and then releasing animals, hair snares, and even microphones to record animal vocalizations. This allows researchers to study animals from a broad range of sizes – from tiny mice and frogs all the way to grizzly bears and even whales – and in a range of different habitats.There are a few R packages that allow us to build SECR models quite simply using animal capture histories from numerous detection methods, including SECR, ASCR, and a new package ACRE which is particularly good for acoustic SECR models. This talk will cover the different methods used to detect animals, how detections are recorded, and the implementation and high-level interpretation of SECR models in R, along with visualizations of the core concepts of SECR models using R.



Melissa Bather headshot
Pronouns: she/her
Vancouver, BC, Canada
I recently moved to British Columbia from New Zealand, where I used to build R Shiny apps for the health sector. I’m currently studying a MSc in Statistics part time through the University of Auckland (the birthplace of R!) and I’m due to finish in November 2023. My research project is to assist in the development and validation of an R package for estimating animal population densities based on various capture methods, particularly acoustic methods. I have been using R for seven years and currently co-organise the R Ladies Vancouver meetup group. Right now I work as a Data Engineer in Vancouver.

Mohsen Soltanifar

SimSST: An R Statistical Software Package to Simulate Stop Signal Task Data

The stop signal task (SST) paradigm with its original roots in 1948 has been proposed to study humans’ response inhibition. Several statistical software codes have been designed by researchers to simulate SST data in order to study various theories of modeling response inhibition and their assumptions. Yet, there has been a missing standalone statistical software package to enable researchers to simulate SST data under generalized scenarios. This paper presents the R statistical software package “SimSST”, available in Comprehensive R Archive Network (CRAN), to simulate stop signal task (SST) data. The package is based on the general non-independent horse race model, the copulas in probability theory, and underlying ExGaussian (ExG) or Shifted Wald (SW) distributional assumption for the involving go and stop processes enabling the researchers to simulate sixteen scenarios of the SST data. A working example for one of the scenarios is presented to evaluate the simulations’ precision on parameter estimations. Package limitations and future work directions for its subsequent extensions are discussed.



Mohsen Soltanifar headshot
Pronouns: he/him
Vancouver, BC, Canada
Mohsen Soltanifar is currently Senior Biostatistican at ClinChoice and an adjunct lecturer at Northeastern University in Vancouver, BC, Canada. He has 2+ years experience in CRO/Pharma and 8+ years experience in Healthcare. His main area of interest in statistics is Clinical Trials with focus of R software applications in their design, analysis, and result presentations. He got his PhD in Biostatistics from University of Tornoto in Canada in 2020 and as of that year has served as registered reviewer for 15+ journals including "Current Oncology" and "Clinical and Translational Neurosicence(CTN)".

Nathan TeBlunthuis

Misclassification Causes Bias in Regression Models: How to Fix It Using the MisclassificationModels Package

Automated classifiers (ACs), often built via supervised machine learning, can categorize large and statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in many scientific and industrial fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses—unless such analyses account for these errors.

In principle, existing statistical methods can use “gold standard” validation data, such as that created by human annotators and often used to validate predictiveness, to correct misclassification bias and produce consistent estimates. I will present an evaluation of such methods, including a new method implemented in the experimental R package misclassificationmodels, via Monte-Carlo simulations designed to reveal each method’s limitations. The results show the new method is both versatile and efficient.

In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.



Nathan TeBlunthuis headshot
Pronouns: He/Him or They/Them
Seattle, WA, USA
Nathan TeBlunthuis is a computational social scientist and postdoctoral researcher at the University of Michigan School of Information and affiliate of the Community Data Science Collective at the University of Washington. Much of Nathan's research uses R to study Wikipedia and other online communities using innovative methods. He earned his Ph.D. from the Department of Communication at the University of Washington in 2021 and has also worked for the Wikimedia Foundation and Microsoft.

OG CascadiaR committee

Retrospective of Cascadia R



Jessica Minnier
Jessica Minnier
Pronouns: she, her
Portland, OR, USA
Jessica is a biostatistician and faculty at OHSU-PSU School of Public Health and Knight Cancer Institute in Portland. She helped organize the first and second Cascadia R Conf starting in 2017 and is grateful the Pacific Northwest R community is still thriving. She has been teaching R for quite some time for her day job and also at other R and biostatistics conferences, and is passionate about helping people new to coding to feel empowered to work with data using R.
STed Laderas
Ted Laderas
Pronouns: he, him
Portland, OR, USA
Ted is a founding member of the Cascadia-R conference. He is a bioinformatics trainer and data science mentor. He trains and empowers learners in learning how to utilize cloud platforms effectively and execute and communicate effective data science. He also is a co-organizer of the PDX-R user group and visualizing Tidy Tuesday datasets in his free time.

Sean Kross

Visualize Data Analysis Pipelines with Tidy Data Tutor

The data frame is one of the most important and fundamental data structures in R. It is no coincidence that one of the leading domain specific languages in R, the Tidyverse, is designed to center the transformation and manipulation of data frames. A key abstraction of the Tidyverse is the use of individual functions that make a change to a data frame, coupled with a pipe operator, which allows people to write sophisticated yet modular data processing pipelines. However within these pipelines it is not always intuitively clear how each operation is changing the underlying data frame, especially as pipelines become long and complex. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams or making presentation slides to illustrate the semantics of operations such as filtering, sorting, reshaping, pivoting, grouping, and joining. These diagrams are time-consuming to create and do not synchronize with real code or data that students are learning about. In this talk I will introduce Tidy Data Tutor, a step-by-step visual representation engine of data frame transformations that can help instructors to explain these operations. Tidy Data Tutor illustrates the row, column, and cell-wise relationships between an operation’s input and output data frames. We hope the Tidy Data Tutor project can augment data science education by providing an interactive and dynamic visualization tool that streamlines the explanation of data frame operations and fosters a deeper understanding of Tidyverse concepts for students.



Sean Kross headshot
Pronouns: he/him
Seattle, WA, USA
Sean Kross, PhD is a Staff Scientist at the Fred Hutch Data Science Lab. His work is focused on understanding data science as a practice, building a better developer experience for data scientists, and creating better outcomes in digital education. He approaches these challenges with computational, statistical, ethnographic, and design-driven methods.

SPEAKER NAME HERE

TITLE HERE

Lighting Talk, SESSION TIME HERE, SESSION ROOM HERE

ABSTRACT HERE

SPEAKER NAME HERE
Pronouns: PRONOUNS HERE
LOCATION HERE
BIO HERE

SPEAKER NAME HERE

TITLE HERE

Regular talk, SESSION TIME HERE, SESSION ROOM HERE

ABSTRACT HERE

SPEAKER NAME HERE
Pronouns: PRONOUNS HERE
LOCATION HERE
BIO HERE

SPEAKER NAME HERE

TITLE HERE

Keynote, SESSION TIME HERE, SESSION ROOM HERE

ABSTRACT HERE

SPEAKER NAME HERE
Pronouns: PRONOUNS HERE
LOCATION HERE
BIO HERE

SPEAKER NAME HERE

TITLE HERE

Lighting Talk, SESSION TIME HERE, SESSION ROOM HERE

ABSTRACT HERE

SPEAKER NAME HERE
Pronouns: PRONOUNS HERE
LOCATION HERE
BIO HERE

SPEAKER NAME HERE

TITLE HERE

Regular talk, SESSION TIME HERE, SESSION ROOM HERE

ABSTRACT HERE

SPEAKER NAME HERE
Pronouns: PRONOUNS HERE
LOCATION HERE
BIO HERE

Ted Laderas

A gRadual introduction to web APIs and JSON

Do the words “Web API” sound intimidating to you? This talk is a gentle introduction to what Web APIs are and how to get data out of them using the {httr2}, {jsonlite}. and {tidyjson} packages. You'll learn how to request data from an endpoint and get the data out. We'll do this using an API that gives us facts about cats. By the end of this talk, web APIs will seem much less intimidating and you will be empowered to access data from them.



Ted Laderas headshot
Pronouns: he/him
Portland, OR, USA
Ted is a founding member of the Cascadia-R conference. He is a bioinformatics trainer and data science mentor. He trains and empowers learners in learning how to utilize cloud platforms effectively and execute and communicate effective data science. He also is a co-organizer of the PDX-R user group and visualizing Tidy Tuesday datasets in his free time.

Valeria Duran

Maximizing Performance: Strategies for Code Optimization

Code optimization improves the performance and efficiency of a program and is essential in software development. Optimizing code involves modifying code that currently slows down a process. Identifying bottlenecks in the code is crucial in reducing the time required to process large datasets and perform computations. Deciding at what point optimization is vital, and if optimization is even needed, is an important task that any programmer will need to consider at some point. There are tradeoffs regarding code optimization, such as code readability, the time needed for modifications, debugging, and more, and weighing the benefits of optimization against these tradeoffs is essential in determining if it is worth pursuing. This talk will review what to consider when optimizing code and valuable tools.



Valeria Duran headshot
Pronouns: she/her
Seattle, WA, USA
Valeria Duran has a B.S. in Mathematical Biology and M.S. in Statistics and Data Science from the University of Houston with four years of R programming experience. She is a Statistical Programmer at the Statistical Center for HIV/AIDS Research & Prevention (SCHARP) at Fred Hutchinson Cancer Center.

Zachary Ruff

Shiny_PNW-Cnet: AI-powered desktop audio processing for biodiversity research and monitoring

Passive acoustic monitoring is an increasingly popular approach in wildlife research and conservation, made possible by the availability of small, rugged, programmable audio recorders (autonomous recording units, or ARUs). Researchers can deploy ARUs across large areas and over long periods to capture sounds produced by rare and cryptic species such as the northern spotted owl and marbled murrelet, making it possible to study these species non-invasively at landscape scales. However, a major challenge with this approach is the need to efficiently detect target sounds within the resulting large audio datasets, which can easily comprise thousands of hours of recordings. Deep learning models are an increasingly popular solution but often require advanced programming skills, which hinders their adoption by wildlife researchers. The US government has monitored northern spotted owl populations since the mid-1990s as mandated by the Northwest Forest Plan. While this monitoring effort originally relied on callback surveys and mark-resight analyses, it began a transition to passive acoustic monitoring starting in 2018. As of 2023, the spotted owl monitoring program relies entirely on ARUs and may well be the world's largest acoustic data collection effort, bringing in roughly 2 million hours of audio per year from thousands of monitoring sites in Washington, Oregon, and California. To detect calls from the northern spotted owl and other species in this massive dataset, we developed PNW-Cnet, a TensorFlow-based deep neural net which detects audio signatures of target species in spectrograms. Originally trained to detect six species of owls, PNW-Cnet has grown iteratively over the years and now detects 37 species of birds and mammals found in the Northwest, expanding the scope of the program toward broad-scale biodiversity monitoring.

We recently developed a graphical desktop application to increase the accessibility of PNW-Cnet and to share the benefits of passive acoustic monitoring with wildlife biologists and the general public. The result is Shiny_PNW-Cnet, a Shiny app intended to be run locally through RStudio. The app uses PNW-Cnet to process audio data and detect target sounds in audio recordings, allows users to visualize apparent detections and extract them for manual review, and includes various utilities for organizing and renaming audio data and other miscellaneous tasks. This app is publicly available and is currently in use by biologists doing bioacoustics work for local, state, federal, and tribal governments, as well as private companies. We will discuss the context of the northern spotted owl monitoring program, the development and evolution of Shiny_PNW-Cnet over the past several years, successes, failures, lessons learned, planned features, and more. This talk is intended for R users of all levels and anyone else interested in how R is empowering the conservation of the Pacific Northwest's most iconic wildlife.



Zachary Ruff headshot
Pronouns: he/him
Corvallis, OR, USA
Zack Ruff is a research assistant in the Department of Fisheries, Wildlife, and Conservation Sciences at Oregon State University and works closely with the U.S. Forest Service through the Pacific Northwest Research Station. He is a wildlife ecologist by training and has previously worked with macaws, plovers, blackbirds, and grouse, but in recent years he has gravitated to projects where he gets to write more code and doesn't have to wear bug spray. Originally from Minnesota, he relocated to Oregon in 2017 and has been working on spotted owl monitoring ever since. His day-to-day work combines bioacoustics, machine learning, and population ecology, and in his spare time he enjoys birding, tinkering, trying new beers, and riding bikes.