-
Notifications
You must be signed in to change notification settings - Fork 13
Accessing and loading station data
The loadeR
package contains the function dataInventory
for a quick overview of the data contained in the dataset. In the case of stations data, the main argument to be provided is the path to the directory where the dataset (stations.txt
, variables.txt
and associated data) are stored (see section 3.1. Standard (ASCII) format for station data of the Wiki for details on station data format).
For instance, this is a quick overview of the VALUE ECA&D dataset using dataInventory
. This dataset contains weather data of 86 stations spread over Europe, and is available for download:
value <- tempfile(fileext = ".zip")
download.file("www.value-cost.eu/sites/default/files/VALUE_ECA_86_v2.zip",
destfile = value)
di <- dataInventory(value)
## [2016-02-18 11:05:49] Doing inventory ...
## [2016-02-18 11:05:50] Done.
The object loaded contains all the necessary information in order to make a call to the loading function loadStationData
, including station codes, geolocation and details on the variable names, units ... :
str(di)
## List of 3
## $ Stations :List of 4
## ..$ station_id : chr [1:86] "000012" "000013" "000014" "000015" ...
## ..$ LonLatCoords : num [1:86, 1:2] 15.4 11.4 13 12.9 16.4 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:86] "000012" "000013" "000014" "000015" ...
## .. .. ..$ : chr [1:2] "lon" "lat"
## ..$ times :List of 3
## .. ..$ startDate: POSIXlt[1:1], format: "1961-01-01"
## .. ..$ endDate : POSIXlt[1:1], format: "2010-12-31"
## .. ..$ timeStep :Class 'difftime' atomic [1:1] 24
## .. .. .. ..- attr(*, "units")= chr "hours"
## ..$ other.metadata:List of 3
## .. ..$ name : chr [1:86] "GRAZ" "INNSBRUCK" "SALZBURG" "SONNBLICK" ...
## .. ..$ altitude: int [1:86] 366 577 437 3106 198 100 156 4 139 179 ...
## .. ..$ source : chr [1:86] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...
## $ Variables :'data.frame': 4 obs. of 3 variables:
## ..$ variable : Factor w/ 4 levels "precip","tmax",..: 1 3 4 2
## ..$ unit : Factor w/ 2 levels "degC","mm": 2 1 1 1
## ..$ missing.code: Factor w/ 1 level "NaN": 1 1 1 1
## $ Summary.stats: NULL
Note that the last element of the inventory, named Summary.stats
is NULL. By default, the inventory will return the basic information, but setting the optional argument return.stats
to TRUE will return also a table summarizing the characteristics of the data (percentage of missing data, mean, min and max values):
di2 <- dataInventory(value, return.stats= TRUE)
## [2016-02-18 11:05:50] Doing inventory ...
## [2016-02-18 11:05:56] Done.
di2$Summary.stats
A more concise summary of the available stations can be obtained using the stationInfo
command. By default, it also returns a map with the locations of the available stations, labelled by their identification codes.
stationInfo(value)
## [2016-02-18 11:05:56] Doing inventory ...
## [2016-02-18 11:05:56] Done.
## stationID longitude latitude name altitude
## 1 000012 15.45000 47.0831 GRAZ 366
## 2 000013 11.40000 47.2667 INNSBRUCK 577
## 3 000014 13.00000 47.8000 SALZBURG 437
## 4 000015 12.95000 47.0500 SONNBLICK 3106
## 5 000016 16.35000 48.2331 WIEN 198
## 6 000017 4.36640 50.8000 UCCLE 100
## ......................................................................
## 85 005585 13.26000 61.1700 SALEN 360
## 86 007682 25.09250 64.6833 SIIKAJOKI-REVONLAHTI 48
## source
## 1 ECA&D
## 2 ECA&D
## 3 ECA&D
## 4 ECA&D
## 5 ECA&D
## 6 ECA&D
## .........
## 85 ECA&D
## 86 ECA&D
The function loadStationData
is the interface to access observational datasets. There are several ways in which observations data can be queried. The most common cases are next presented.
Given the station codes provided by the inventory, it is possible to retrieve the time series for a (several) selected station(s) directly by the identification codes. The following code will load summer temperature data (JJA) for the period 1981-2000 for two stations: San Sebastian-Igueldo (000234
) and Madrid-Barajas (003946
):
example1 <- loadStationData(dataset = value,
var="tmax",
stationID = c("000234", "003946"),
season = 6:8,
years = 1981:2000)
## [2016-02-18 11:05:57] Loading data ...
## [2016-02-18 11:05:58] Retrieving metadata ...
## [2016-02-18 11:05:58] Done.
str(example1)
## List of 5
## $ Variable:List of 1
## ..$ varName: chr "tmax"
## $ Data : num [1:1840, 1:2] 29 22.4 15.2 18.2 23 20 27.4 28.8 17.6 16.8 ...
## ..- attr(*, "dimensions")= chr [1:2] "time" "station"
## $ xyCoords: num [1:2, 1:2] -2.04 -3.56 43.31 40.47
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "000234" "003946"
## .. ..$ : chr [1:2] "longitude" "latitude"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
## ..$ end : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
## $ Metadata:List of 4
## ..$ station_id: int [1:2] 234 3946
## ..$ name : chr [1:2] "SAN-SEBASTIAN-IGUELDO" "MADRID-BARAJAS"
## ..$ altitude : int [1:2] 251 609
## ..$ source : chr [1:2] "ECA&D" "ECA&D"
Alternatively, we can choose a location by its coordinates. From the stationInfo
output, we know the geographical coordinates of the San Sebastian-Igueldo (-2.03920, 43.3075). We can introduce these coordinates in the lonLim
and latLim
arguments. Note that it is not necessary to specify all the decimals, as the function will take care of finding the closest station to the given coordinate:
example2 <- loadStationData(dataset = value,
var="tmax",
lonLim = -2.03,
latLim = 43.3,
season = 6:8,
years = 1981:2000)
## [2016-02-18 11:05:58] Closest station located at 0.0119 spatial units from the specified [lonLim,latLim] coordinate
## [2016-02-18 11:05:59] Loading data ...
## [2016-02-18 11:06:00] Retrieving metadata ...
## [2016-02-18 11:06:00] Done.
str(example2)
## List of 5
## $ Variable:List of 1
## ..$ varName: chr "tmax"
## $ Data : num [1:1840, 1] 29 22.4 15.2 18.2 23 20 27.4 28.8 17.6 16.8 ...
## ..- attr(*, "dimensions")= chr [1:2] "time" "station"
## $ xyCoords: num [1, 1:2] -2.04 43.31
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr "000234"
## .. ..$ : chr [1:2] "longitude" "latitude"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
## ..$ end : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
## $ Metadata:List of 4
## ..$ station_id: int 234
## ..$ name : chr "SAN-SEBASTIAN-IGUELDO"
## ..$ altitude : int 251
## ..$ source : chr "ECA&D"
A particular case of selection by coordinates is when all data within a given bounding box is desired. In this case, the lonLim
and latLim
arguments are filled with a vector of length two, defining the corners of the bounding box. For instance, running the next example, MARSEILLE-MARIGNANE, CAGLIARI, NAVACERRADA, SAN-SEBASTIAN-IGUELDO, etc. are loaded.
example3 <- loadStationData(dataset = value,
var="tmax",
lonLim = c(-5,10),
latLim = c(37,45),
season = 6:8,
years = 1981:2000)
## [2016-02-18 11:06:01] Loading data ...
## [2016-02-18 11:06:01] Retrieving metadata ...
## [2016-02-18 11:06:01] Done.
str(example3)
## List of 5
## $ Variable:List of 1
## ..$ varName: chr "tmax"
## $ Data : num [1:1840, 1:9] 30.4 27.8 25 22.2 27.2 30.1 28.1 28.2 30.3 31.2 ...
## ..- attr(*, "dimensions")= chr [1:2] "time" "station"
## $ xyCoords: num [1:9, 1:2] 5.227 9.05 -4.01 -2.039 0.491 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:9] "000039" "000175" "000232" "000234" ...
## .. ..$ : chr [1:2] "longitude" "latitude"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
## ..$ end : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
## $ Metadata:List of 4
## ..$ station_id: int [1:9] 39 175 232 234 236 355 800 3919 3946
## ..$ name : chr [1:9] "MARSEILLE-MARIGNANE" "CAGLIARI" "NAVACERRADA" "SAN-SEBASTIAN-IGUELDO" ...
## ..$ altitude : int [1:9] 5 21 1894 251 44 1567 151 8 609
## ..$ source : chr [1:9] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...
By default, the arguments defining the spatial domain of the query (lonLim
and latLim
or stationID
) are NULL. If none of them is indicated, the function will load all available stations for the time domain selected:
example4 <- loadStationData(dataset = value,
var="tmax",
season = 6:8,
years = 1981:2000)
## [2016-02-18 11:06:02] Loading data ...
## [2016-02-18 11:06:03] Retrieving metadata ...
## [2016-02-18 11:06:03] Done.
str(example4)
## List of 5
## $ Variable:List of 1
## ..$ varName: chr "tmax"
## $ Data : num [1:1840, 1:86] 28.4 29.1 30.1 29.3 21.8 20.4 24.7 27.4 26.8 24.9 ...
## ..- attr(*, "dimensions")= chr [1:2] "time" "station"
## $ xyCoords: num [1:86, 1:2] 15.4 11.4 13 12.9 16.4 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:86] "000012" "000013" "000014" "000015" ...
## .. ..$ : chr [1:2] "longitude" "latitude"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
## ..$ end : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
## $ Metadata:List of 4
## ..$ station_id: int [1:86] 12 13 14 15 16 17 21 28 29 30 ...
## ..$ name : chr [1:86] "GRAZ" "INNSBRUCK" "SALZBURG" "SONNBLICK" ...
## ..$ altitude : int [1:86] 366 577 437 3106 198 100 156 4 139 179 ...
## ..$ source : chr [1:86] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...
The same behaviour can be expected with the time definition of the query. For instance, when season
and/or years
are left to their default value NULL, all months and/or years within the dataset will be returned.
Next we use function subsetGrid
from package transformeR
to extract each station in example1
. As a result, we obtain two station data objects, Madrid
and Donostia
(San Sebastian):
library(transformeR)
Madrid <- subsetGrid(example1, station.id = "003946")
Donostia <- subsetGrid(example1, station.id = "000234")
Function temporalPlot
from package visualizeR
plots as much time series as we want (in this case 2) in a single graph:
library(visualizeR)
temporalPlot(Madrid, Donostia, xyplot.custom = list(ylab = "Tasmax ºC"))
We could also use spatialPlot
to visualize the climatological mean (here we use example3
):
spatialPlot(climatology(example3), backdrop.theme = "countries", colorkey = T)
The GSN dataset contains data for a World station network. A subset containing the stations with at least the 75% of the data in the period 1979-2012 (374 stations) can be downloaded as follows:
gsn <- tempfile(fileext = "zip")
download.file("http://meteo.unican.es/work/loadeR/data/GSN_World.zip",
destfile = gsn)
gsnload <- loadStationData(gsn, var = "tmean")
## [2016-02-18 11:06:07] Loading data ...
## [2016-02-18 11:06:09] Retrieving metadata ...
## [2016-02-18 11:06:10] Done.
library(visualizeR)
spatialPlot(climatology(gsnload), backdrop.theme = "coastline", colorkey = T)
___NOTE: The unit for temeparture variables in this dataset is 0.1 ºC. Characteristics of the variables in station data can be checked opening the Variables.txt file (see section 3.1. Standard (ASCII) format for station data)
The VALUE ECA&D dataset contains weather data of 86 stations spread over Europe, and is available for download:
value <- tempfile(fileext = ".zip")
download.file("www.value-cost.eu/sites/default/files/VALUE_ECA_86_v2.zip",
destfile = value)
valueload <- loadStationData(value, var = "tmean")
## [2016-02-18 11:06:11] Loading data ...
## [2016-02-18 11:06:12] Retrieving metadata ...
## [2016-02-18 11:06:12] Done.
library(visualizeR)
spatialPlot(climatology(valueload), backdrop.theme = "coastline", colorkey = T)
print(sessionInfo())
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=es_ES.UTF-8
## [9] LC_ADDRESS=es_ES.UTF-8 LC_TELEPHONE=es_ES.UTF-8
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=es_ES.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] visualizeR_1.0.0 sm_2.2-5.4 fields_9.0 maps_3.2.0
## [5] spam_1.4-0 transformeR_1.1.2 rJava_0.9-8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 loadeR.java_1.1-0 plyr_1.8.4
## [4] loadeR_1.2.0 compiler_3.4.0 RColorBrewer_1.1-2
## [7] bitops_1.0-6 tools_3.4.0 boot_1.3-17
## [10] vioplot_0.2 lattice_0.20-35 Matrix_1.2-7.1
## [13] parallel_3.4.0 akima_0.6-2 padr_0.3.0
## [16] raster_2.5-8 mapplots_1.5 data.table_1.10.4
## [19] dtw_1.18-1 SpecsVerification_0.5-2 sp_1.2-5
## [22] latticeExtra_0.6-28 magrittr_1.5 scales_0.4.1
## [25] CircStats_0.2-4 MASS_7.3-44 abind_1.4-5
## [28] colorspace_1.3-2 proxy_0.4-17 munsell_0.4.3
## [31] RCurl_1.95-4.8 verification_1.42 RcppEigen_0.3.3.3.0
- Package Installation (and known problems)
- Model Data (reanalysis and climate projections)
- Observations (station and gridded data)
- Standard data manipulation