-
Notifications
You must be signed in to change notification settings - Fork 13
/
extract_tourism_data.Rd
72 lines (62 loc) · 2.69 KB
/
extract_tourism_data.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract_tourism_data.R
\name{extract_tourism_data}
\alias{extract_tourism_data}
\title{extract toursim Data ONS working file spreadsheet}
\usage{
extract_tourism_data(x, sheet_name = "Tourism", col_names = c("year", "GVA",
"total", "perc", "overlap"), ...)
}
\arguments{
\item{x}{Location of the input spreadsheet file. Named something like
"working_file_dcms_VXX.xlsm".}
\item{sheet_name}{The name of the spreadsheet in which the data are stored.
Defaults to \code{New ABS Data}.}
\item{col_names}{character vector used to rename the column names from the
imported spreadsheet. Defaults to
\code{c('year','gva','total','perc','overlap')}.}
\item{...}{additional arguments to be passed to \code{readxl::read_excel}.}
}
\value{
The function returns nothing, but saves the extracted dataset to
\code{file.path(output_path, 'OFFICIAL_ABS.Rds')}. This is an R data
object, which retains the column types which would be lost if converted to
a flat format like CSV.
}
\description{
The data which underlies the Economic Sectors for DCMS sectors
data is typically provided to DCMS as a spreadsheet from the Office for
National Statistics. This function extracts the tourism data from that
spreadsheet, and saves it to .Rds format. These data are provided as the
usual tourism values in the GVA dataset cannot be used.
IT IS HIGHLY ADVISEABLE TO ENSURE THAT THE DATA WHICH ARE CREATED BY THIS
FUNCTION ARE NOT STORED IN A FOLDER WHICH IS A GITHUB REPOSITORY TO
MITIGATE AGAINST ACCIDENTAL COMMITTING OF OFFICIAL DATA TO GITHUB. TOOLS TO
FURTHER HELP MITIGATE THIS RISK ARE AVAILABLE AT
https://github.com/ukgovdatascience/dotfiles.
}
\details{
The best way to understand what happens when you run this function
is to look at the source code, which is available at
\url{https://github.com/ukgovdatascience/eesectors/blob/master/R/}. The
code is relatively transparent and well documented. A brief explanation of
what the function does here:
1. The function calls \code{readxl::read_excel} to load the appropriate
page from the underlying spreadsheet.
2. Sanitise the \code{colnames} using a user-supplied vector in
\code{new_colnames}. If there are no changes to the 2016 spreadhseet, in
future years, then the default vector should work in future years. If there
have been changes, this is likely to be a cause of errors.
3. Empty rows (containing all \code{NA}s) are removed.
4. The data are saved out to an R serialisation object
\code{OFFICIAL_tourism.Rds} in the specified folder.
}
\examples{
\dontrun{
library(eesectors)
extract_toursim_data(
x = 'OFFICIAL_working_file_dcms_V13.xlsm',
sheet_name = 'Tourism'
)
}
}