-
Notifications
You must be signed in to change notification settings - Fork 13
/
extract_SIC91_data.Rd
72 lines (61 loc) · 2.68 KB
/
extract_SIC91_data.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract_SIC91_data.R
\name{extract_SIC91_data}
\alias{extract_SIC91_data}
\title{extract SIC 91 Sales Data from ONS working file spreadsheet}
\usage{
extract_SIC91_data(x, sheet_name = "SIC 91 Sales Data", col_names = c("SIC",
"description", "year", "ABS", "blank", "code"), ...)
}
\arguments{
\item{x}{Location of the input spreadsheet file. Named something like
"working_file_dcms_VXX.xlsm".}
\item{sheet_name}{The name of the spreadsheet in which the data are stored.
Defaults to \code{New ABS Data}.}
\item{col_names}{character vector used to rename the column names from the
imported spreadsheet. Defaults to
\code{c('year','ABS','total','perc','overlap')}.}
\item{...}{additional arguments to be passed to \code{readxl::read_excel}.}
}
\value{
The function returns nothing, but saves the extracted dataset to
\code{file.path(output_path, 'OFFICIAL_ABS.Rds')}. This is an R data
object, which retains the column types which would be lost if converted to
a flat format like CSV.
}
\description{
The data which underlies the Economic Sectors for DCMS sectors
data is typically provided to DCMS as a spreadsheet from the Office for
National Statistics. This function extracts the SIC Sales Data from that
spreadsheet, and saves it to .Rds format. These data are used in place of
the usual GVA values. An explanation of why can be found in the methodology
note that accompanies the statistical first release
(\url{https://www.gov.uk/government/publications/dcms-sectors-economic-estimates-methodology}).
IT IS HIGHLY ADVISEABLE TO ENSURE THAT THE DATA WHICH ARE CREATED BY THIS
FUNCTION ARE NOT STORED IN A FOLDER WHICH IS A GITHUB REPOSITORY TO
MITIGATE AGAINST ACCIDENTAL COMMITTING OF OFFICIAL DATA TO GITHUB. TOOLS TO
FURTHER HELP MITIGATE THIS RISK ARE AVAILABLE AT
https://github.com/ukgovdatascience/dotfiles.
}
\details{
The best way to understand what happens when you run this function
is to look at the source code, which is available at
\url{https://github.com/ukgovdatascience/eesectors/blob/master/R/}. The
code is relatively transparent and well documented. A brief explanation of
what the function does here:
1. The function calls \code{readxl::read_excel} to load the appropriate
page from the underlying spreadsheet.
2. Columns of interest are subset using \code{x[, c('SIC', 'year', 'ABS')]}
3. Empty rows (containing all \code{NA}s) are removed.
4. The data are saved out to an R serialisation object
\code{OFFICIAL_SIC91.Rds} in the specified folder.
}
\examples{
\dontrun{
library(eesectors)
extract_toursim_data(
x = 'OFFICIAL_working_file_dcms_V13.xlsm',
sheet_name = 'Tourism'
)
}
}