38 Workflow
38.1 Agile Data Science with R
- Edwin Thoen
I joined a Scrum team (frontend, backend, ux designer, product owner, second data scientist) to create a machine learning model that we brought to production using the Agile principles. It was an inspiring experience from which I learned a great deal. My colleagues patiently explained the principles of Agile software development and together we applied them to the data science context.All these experiences culminated in the workflow that we now adhere to at work and I think it is worthwhile to share it. It is heavily based on the principles of Agile software production, hence the title. We have explored which of the concepts from Agile did and did not work for data science and we got hands-on experience in working from these principles in an R project that actually got to production.
38.2 Data Management in Large-Scale Education Research
- Crystal Lewis
This book begins, like many other books in this subject area, by describing the research life cycle and how data management fits within the larger picture. The remaining chapters are then organized by each phase of the life cycle, with examples of best practices provided for each phase. Considerations on whether you should implement, and how to integrate those practices into your workflow will be discussed.
38.3 Github actions with R
- Chris Brown
- Murray Cadzow
- Paula A Martinez
- Rhydwyn McGuire
- David Neuzerling
- David Wilkinson, Saras Windecker
GitHub actions allow us to trigger automated steps after we launch GitHub interactions such as when we push, pull, submit a pull request, or write an issue.
38.4 R for the Rest of Us
R for the Rest of Us will show ways that R can be used beyond complex statistical analysis. Readers will learn about a range of uses for R, many of which they have likely never even considered.
Link: https://book.rfortherestofus.com/
Physical copy available: https://amzn.to/3RBuKbO
38.5 R in Production
- Hadley Wickham
An assembly of notes about R in Production.
38.6 R Workflow for Reproducible Data Analysis and Reporting
- Frank E Harrell Jr
This work is intended to foster best practices in reproducible data documentation and manipulation, statistical analysis, graphics, and reporting. It will enable the reader to efficiently produce attractive, readable, and reproducible research reports while keeping code concise and clear. Readers are also guided in choosing statistically efficient descriptive analyses that are consonant with the type of data being analyzed.
38.7 Reproducible Analytical Pipelines - Master’s of Data Science
- Bruno Rodrigues
This course is my take on setting up code that results in some data product. This code has to be reproducible, documented and production ready. Not my original idea, but introduced by the UK’s Analysis Function.
The basic idea of a reproducible analytical pipeline (RAP) is to have code that always produces the same result when run, whatever this result might be. This is obviously crucial in research and science, but this is also the case in businesses that deal with data science/data-driven decision making etc.
A well documented RAP avoids a lot of headache and is usually re-usable for other projects as well.
Link: https://rap4mads.eu/
38.8 Reproducible Analytical Pipelines (RAP) Companion
Reproducible Analytical Pipelines require a range of tools and techniques to implement that can be a challenge to overcome, and this book address some of the common knowledge gaps and hard-to-Google problems that upcoming RAP-pers face.
38.9 Research Software Engineering
Overview open source software and gives R examples in automation and reproducibility.
38.10 The Data Validation Cookbook
The purposes of this book include demonstrating the main tools and workflows of the validate package, giving examples of common data validation tasks, and showing how to analyze data validation results.
38.11 The targets R Package Design Specification
targets has an elaborate structure to support its advanced features while ensuring decent performance. This bookdown site is a design specification to explain the major aspects of the internal architecture, including the data storage model, object oriented design, and orchestration and branching model
38.12 The targets R Package User Manual
- Will Landau
The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets learns how your pipeline fits together, skips costly runtime for tasks that are already up to date, runs only the necessary computation, supports implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the results match the underlying code and data.