Chapter 1 Introduction

This document sets out the recommendations for best practice when using R. The aim is to provide a consistent and comprehensive approach to using R for analysis and applications. It forms part of the wider guidance on end-user computing. The document is split into the following areas:

Recommended Software: Details the recommended software to install, both the core R applications and supporting software.
Writing R Code: Gives guidance on on how to write R scripts and applications.
Recommended Packages: Lists the recommended packages to use for achieving common tasks.
Development Practices: Describes the recommended approaches to developing in R.

This document is not supposed to be a comprehensive guide to using R. It gives guidance and sign-posts users to resources with more details.

1.1 End-User Computing

The Bank of England’s 2016 report Solvency II: internal model approval process data review findings identifies end-user computing as a Solvency II risk. In particular, it states:

Spreadsheets and other user-developed applications are a form of information technology, and all information technology needs to be appropriately controlled.

End-user computing requires a user to think about the following areas when developing in R:

Documentation
Testing
Reproducibility
Change control
Access control

The degree to which the above points need to be addressed depend on the result of a simple risk analysis. For example, if a piece of work is a one-off analysis and the results are not distributed further, then some simple documentation is all that is required. If the work is a complex set of functions, which important systems depend on, then each function must be clearly documented and user guides specified.

More detail is given on each of the areas below.

1.2 Risk Analysis

The level of documentation, testing and change control depends on two factors:

Complexity
Criticality

Complexity: If a piece of work is very simple it may only require a short description, a manual test and saving to a secure location. However, if it is highly complex it is important for others, and your future self, to document it thoroughly, provide repeatable tests and save the scripts in version control.

Criticality: If a piece of work is a simple analysis for curiosity it may not need any controls applied to it. However, if it provides functionality which other systems rely on or informs major business decisions then it is important to ensure it is thoroughly tested and reviewed and changes are not made without careful consideration of how it affects dependents.

In order to assess the level of controls our end-user computing standards require we perform a simple risk analysis by estimating the level of complexity and criticality of the work, using a scale of high, medium or low. If both are low then no further consideration is required. However, if any of the above factors are medium or high then the work is considered material and the following recommendations should be considered.

1.3 Documentation

Documentation ensures everyone using the code understands what it does and how to use it effectively. For simple scripts, a commented header in the file is sufficient. For multi-file applications it is recommended that a separate README text file is added.

1.3.0.1 Low Complexity and Criticality

As a minimum, it is recommended that the following are documented with the code:

Record the name of the project, analysis or application.
Write a description of of what the code does (not how).
Record the purpose of the code.
Write instructions for how to use the code.

1.3.0.2 Medium Complexity

As the complexity increases it becomes important to ensure users know how to use the code correctly. In addition to the list above, the following is also recommended:

Provide examples of usage.

1.3.0.3 High Complexity

For high complexity work it is important that the code is broken up into sections or functions. Therefore the following are recommended:

Write separate descriptions for each section or function.
Provide separate examples for each section or function.
Write a user guide

1.3.0.4 Medium Criticality

As criticality increases it becomes important to ensure reproducibility is possible. Therefore, in addition to the minimum requirements above, following is recommended:

Clearly define dependencies (packages used, input data used,…etc).

1.3.0.5 High Criticality

For high criticality work consider creating an R package then:

Document the package by creating a DESCRIPTION.
Document each function using Roxygen2.
Write a user guide as a vignette explaining its use.

1.4 Testing

Testing is required to ensure the code actually does what is intended. It is also necesary to record the tests, so they can be repeated, and the results, as evidence of correctness.

1.4.0.1 Low Complexity and Criticality

As a minimum, manual tests should be performed to ensure the user is comfortable the code is working as intended. In order to reproduce the tests it is recommended the following are recorded:

Describe the test process.
Record the test results.

1.4.0.2 Medium Complexity

As the complexity increases it becomes more important to ensure the code is tested, consider:

Comparisons with known results.
Regression tests to compare new results to previous ones.

1.4.0.3 High Complexity

For high complexity work it is important that the code is broken up into sections or functions. Therefore, following are recommended:

Separate tests for each section or function.
Unit tests to compare with expected results when given simple inputs.

1.4.0.4 Medium Criticality

As the criticality increases checks should be added within the code. This ensures the code and the results are tested each time it is run. The following are also recommended:

Assertions on the inputs to ensure they are valid for the code (e.g. a particular value is positive).
Check appropriate errors or warnings are returned when inputs are invalid.

1.4.0.5 High Criticality

For high criticality work consider creating an R package then:

Test the package using testthat to write executable tests.
Execute regression tests whenever changes are made.
Perform integration tests with dependent systems.

1.5 Reproducibility

Important work should reproducible by others, and your future self. This means keeping track of inputs, parameters and code used to produce results.

1.5.0.1 Low Complexity and Criticality

As a minimum, it should be clear where the inputs came from. The following are recommended:

Clearly define how to import input data so it can be re-executed.
Consider keeping a copy of input data and parameters alongside the code.

1.5.0.2 Medium Complexity

As the complexity increases consider using a structured project, for example.

Use RStudio projects, which allows you to reference files using relative paths and save open files and workspaces.
Consider using R markdown which captures description, code and results in a single document.

1.5.0.3 High Complexity

For high complexity work it the following are recommended:

It is peer reviewed. The best way to ensure others can reproduce your work is to get a peer review.
Avoid referencing external files, unless you can be positive they will always be available in the specified location. Even then, it is worth caching external files in a subfolder, if they are not too large.

1.5.0.4 Medium Criticality

As the criticality increases following is recommended:

Output an audit log with a time stamp, username of the person executing the code and a summary of computing environment (e.g. package versions used).

1.5.0.5 High Criticality

For high criticality work also consider:

Holding inputs in version control or other system that allows you to retrieve old versions.
Using packrat to manage the versions of packages used.

1.6 Change control

Change control is about managing updates and bug fixes to the code in a way that is tracked, reviewed, and approved. It ensures change, and its implications, is understood and can also be reversed, if necessary.

1.6.0.1 Low Complexity and Criticality

The simplest approach is to:

Produce new files for each version.
Archive older versions so you can rollback is necessary.
Document changes, concentrating on why rather than what.

1.6.0.2 Medium Complexity

As complexity increases it becomes more important to track changes in a more robust manner. Instead of the approach above:

Use version control software, such as Git, to track changes.

1.6.0.3 High Complexity

Break changes down into tasks and implement each one separately.
Create a branch for each task and merge back into the main branch once complete.

1.6.0.4 Medium Criticality

As criticality increases it becomes more important to confirm changes are correct and appropriate.

Use version control software, such as Git, to track changes.
Ensure changes are peer reviewed and documented.

1.6.0.5 High Criticality

Use version control hosting application, such as GitHub or Microsoft’s Azure DevOps, to share changes.
Formalise reviews using pull requests.
Require approval from an appropriate person or group before deploying to production.

1.7 Access control

There should be appropriate controls on who has access to the source code and/or results. The developers should be aware of who has access the work they are producing.

1.7.0.1 Low Complexity and Criticality

The simplest approach is:

Save files to a network drive, with access controlled by the IT dept.

1.7.0.2 Medium Complexity

As complexity increases it is important to ensure that no unintentional changes are propagated into production.

Separate source code from results.
Maintain a protected version, which can only be updated by appropriate people (Note: GitHub provides this functionality out-of-the-box).

1.7.0.3 High Complexity

For high complexity work it is important only those with the appropriate level of knowledge can change the code.

Use version control hosting application, such as GitHub or Microsoft’s Azure DevOps, to share changes.
Deploy to a production location for users.

1.7.0.4 Medium Criticality

As criticality increases is important to ensure work is protected and backed up.

Ensure an appropriate back-up and recovery strategy is in place.
Ensure that the code has gone through an appropriate review before deploying to production.

1.7.0.5 High Criticality

For high criticality work production code should not be changable directly. A release process should be set up so appropriate approval must be given before code is promoted.

Maintain and regularly review the list of developers and users.
Ensure that the code has gone through an appropriate release process before deploying to production.

1.8 Appendix: Solvency II: internal model approval process data review findings

1.8.1 Sub-risk 5: IT environment, technology and tools

Spreadsheets and other user-developed applications are a form of information technology, and all information technology needs to be appropriately controlled.

Finding 9: end-user computing (EUC)

4.36 Spreadsheets and other end-user applications (2012.4.39) remained common in capital and balance sheet modelling. The PRA does not have a view on whether end-user computing (EUC) is appropriate, as it is a form of IT, and all IT needs to be appropriately controlled. Where EUC is material to the internal model data flow, the PRA will be looking for appropriate controls for data quality such as reasonableness checks, input validations, peer reviews, systems environment configuration, logical access management, ongoing change controls (development, build , systems and user acceptance testing) and release management (including implementation and operational testing), disaster recovery, and documentation.

4.37 Automation of spreadsheets reduces the risk of manual error (2012.4.42), but can introduce different problems such as reduced oversight, inadequate transparency about the extent of linking and the proliferation of nested spreadsheets and the attendant issue of ‘broken links’.

4.38 The 2012 report did not engage comprehensively with cyber risk. This is likely to be an area of increasing focus, following alerts and increasing concerns about security as firms move away from localised application and onto networked platforms. As noted in the Bank of England Financial Stability Report,1 cyber attacks can threaten financial stability by disrupting the provision of critical functions from the financial system to the real economy. The Financial Policy Committee has recommended that resilience testing be a regular part of core firms’ cyber resilience assessment. Insurers providing cover for cyber or business interruption are also indirectly exposed to cyber risk.

Finding 10: IT infrastructure

4.39 Complex IT implementations (2012.4.44) can be challenging to manage without a clear definition of user requirements, design, testing and appropriate controls for effective operation in business as usual. This continued to be an area of risk. One firm took seven years to implement a tactical system, and still has no strategic system for its upstream administration processes.

R Best Practice

R Best Practice

Chapter 1 Introduction

1.1 End-User Computing

1.2 Risk Analysis

1.3 Documentation

1.3.0.1 Low Complexity and Criticality

1.3.0.2 Medium Complexity

1.3.0.3 High Complexity

1.3.0.4 Medium Criticality

1.3.0.5 High Criticality

1.4 Testing

1.4.0.1 Low Complexity and Criticality

1.4.0.2 Medium Complexity

1.4.0.3 High Complexity

1.4.0.4 Medium Criticality

1.4.0.5 High Criticality

1.5 Reproducibility

1.5.0.1 Low Complexity and Criticality

1.5.0.2 Medium Complexity

1.5.0.3 High Complexity

1.5.0.4 Medium Criticality

1.5.0.5 High Criticality

1.6 Change control

1.6.0.1 Low Complexity and Criticality

1.6.0.2 Medium Complexity

1.6.0.3 High Complexity

1.6.0.4 Medium Criticality

1.6.0.5 High Criticality

1.7 Access control

1.7.0.1 Low Complexity and Criticality

1.7.0.2 Medium Complexity

1.7.0.3 High Complexity

1.7.0.4 Medium Criticality

1.7.0.5 High Criticality

1.8 Appendix: Solvency II: internal model approval process data review findings

1.8.1 Sub-risk 5: IT environment, technology and tools