Skip to content

Accessing and loading station data

miturbide edited this page Nov 30, 2018 · 20 revisions

Obtaining a quick overview of the dataset

The loadeR package contains the function dataInventory for a quick overview of the data contained in the dataset. In the case of stations data, the main argument to be provided is the path to the directory where the dataset (stations.txt, variables.txt and associated data) are stored (see section 3.1. Standard (ASCII) format for station data of the Wiki for details on station data format).

For instance, this is a quick overview of the VALUE ECA&D dataset using dataInventory. This dataset contains weather data of 86 stations spread over Europe, and is available for download:

value <- tempfile(fileext = ".zip")
download.file("www.value-cost.eu/sites/default/files/VALUE_ECA_86_v2.zip", 
              destfile = value)
di <- dataInventory(value)

## [2016-02-18 11:05:49] Doing inventory ...
## [2016-02-18 11:05:50] Done.

The object loaded contains all the necessary information in order to make a call to the loading function loadStationData, including station codes, geolocation and details on the variable names, units ... :

str(di)

## List of 3
##  $ Stations     :List of 4
##   ..$ station_id    : chr [1:86] "000012" "000013" "000014" "000015" ...
##   ..$ LonLatCoords  : num [1:86, 1:2] 15.4 11.4 13 12.9 16.4 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:86] "000012" "000013" "000014" "000015" ...
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   ..$ times         :List of 3
##   .. ..$ startDate: POSIXlt[1:1], format: "1961-01-01"
##   .. ..$ endDate  : POSIXlt[1:1], format: "2010-12-31"
##   .. ..$ timeStep :Class 'difftime'  atomic [1:1] 24
##   .. .. .. ..- attr(*, "units")= chr "hours"
##   ..$ other.metadata:List of 3
##   .. ..$ name    : chr [1:86] "GRAZ" "INNSBRUCK" "SALZBURG" "SONNBLICK" ...
##   .. ..$ altitude: int [1:86] 366 577 437 3106 198 100 156 4 139 179 ...
##   .. ..$ source  : chr [1:86] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...
##  $ Variables    :'data.frame':   4 obs. of  3 variables:
##   ..$ variable    : Factor w/ 4 levels "precip","tmax",..: 1 3 4 2
##   ..$ unit        : Factor w/ 2 levels "degC","mm": 2 1 1 1
##   ..$ missing.code: Factor w/ 1 level "NaN": 1 1 1 1
##  $ Summary.stats: NULL

Note that the last element of the inventory, named Summary.stats is NULL. By default, the inventory will return the basic information, but setting the optional argument return.stats to TRUE will return also a table summarizing the characteristics of the data (percentage of missing data, mean, min and max values):

di2 <- dataInventory(value, return.stats= TRUE)

## [2016-02-18 11:05:50] Doing inventory ...
## [2016-02-18 11:05:56] Done.

di2$Summary.stats

A more concise summary of the available stations can be obtained using the stationInfo command. By default, it also returns a map with the locations of the available stations, labelled by their identification codes.

stationInfo(value)

## [2016-02-18 11:05:56] Doing inventory ...
## [2016-02-18 11:05:56] Done.

##    stationID longitude latitude                          name altitude
## 1     000012  15.45000  47.0831                          GRAZ      366
## 2     000013  11.40000  47.2667                     INNSBRUCK      577
## 3     000014  13.00000  47.8000                      SALZBURG      437
## 4     000015  12.95000  47.0500                     SONNBLICK     3106
## 5     000016  16.35000  48.2331                          WIEN      198
## 6     000017   4.36640  50.8000                         UCCLE      100
## ......................................................................
## 85    005585  13.26000  61.1700                         SALEN      360
## 86    007682  25.09250  64.6833          SIIKAJOKI-REVONLAHTI       48
##    source
## 1   ECA&D
## 2   ECA&D
## 3   ECA&D
## 4   ECA&D
## 5   ECA&D
## 6   ECA&D
## .........
## 85  ECA&D
## 86  ECA&D

Loading station data

The function loadStationData is the interface to access observational datasets. There are several ways in which observations data can be queried. The most common cases are next presented.

Loading station data from station codes

Given the station codes provided by the inventory, it is possible to retrieve the time series for a (several) selected station(s) directly by the identification codes. The following code will load summer temperature data (JJA) for the period 1981-2000 for two stations: San Sebastian-Igueldo (000234) and Madrid-Barajas (003946):

example1 <- loadStationData(dataset = value, 
                            var="tmax", 
                            stationID = c("000234", "003946"), 
                            season = 6:8, 
                            years = 1981:2000)

## [2016-02-18 11:05:57] Loading data ...
## [2016-02-18 11:05:58] Retrieving metadata ...
## [2016-02-18 11:05:58] Done.

str(example1)

## List of 5
##  $ Variable:List of 1
##   ..$ varName: chr "tmax"
##  $ Data    : num [1:1840, 1:2] 29 22.4 15.2 18.2 23 20 27.4 28.8 17.6 16.8 ...
##   ..- attr(*, "dimensions")= chr [1:2] "time" "station"
##  $ xyCoords: num [1:2, 1:2] -2.04 -3.56 43.31 40.47
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "000234" "003946"
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
##   ..$ end  : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
##  $ Metadata:List of 4
##   ..$ station_id: int [1:2] 234 3946
##   ..$ name      : chr [1:2] "SAN-SEBASTIAN-IGUELDO" "MADRID-BARAJAS"
##   ..$ altitude  : int [1:2] 251 609
##   ..$ source    : chr [1:2] "ECA&D" "ECA&D"

Loading station data from geographical coordinates

Alternatively, we can choose a location by its coordinates. From the stationInfo output, we know the geographical coordinates of the San Sebastian-Igueldo (-2.03920, 43.3075). We can introduce these coordinates in the lonLim and latLim arguments. Note that it is not necessary to specify all the decimals, as the function will take care of finding the closest station to the given coordinate:

example2 <- loadStationData(dataset = value, 
                            var="tmax", 
                            lonLim = -2.03, 
                            latLim = 43.3, 
                            season = 6:8, 
                            years = 1981:2000)

## [2016-02-18 11:05:58] Closest station located at 0.0119 spatial units from the specified [lonLim,latLim] coordinate
## [2016-02-18 11:05:59] Loading data ...
## [2016-02-18 11:06:00] Retrieving metadata ...
## [2016-02-18 11:06:00] Done.

str(example2)

## List of 5
##  $ Variable:List of 1
##   ..$ varName: chr "tmax"
##  $ Data    : num [1:1840, 1] 29 22.4 15.2 18.2 23 20 27.4 28.8 17.6 16.8 ...
##   ..- attr(*, "dimensions")= chr [1:2] "time" "station"
##  $ xyCoords: num [1, 1:2] -2.04 43.31
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr "000234"
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
##   ..$ end  : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
##  $ Metadata:List of 4
##   ..$ station_id: int 234
##   ..$ name      : chr "SAN-SEBASTIAN-IGUELDO"
##   ..$ altitude  : int 251
##   ..$ source    : chr "ECA&D"

Loading station data within a given geographical bounding box

A particular case of selection by coordinates is when all data within a given bounding box is desired. In this case, the lonLim and latLim arguments are filled with a vector of length two, defining the corners of the bounding box. For instance, running the next example, MARSEILLE-MARIGNANE, CAGLIARI, NAVACERRADA, SAN-SEBASTIAN-IGUELDO, etc. are loaded.

example3 <- loadStationData(dataset = value, 
                            var="tmax", 
                            lonLim = c(-5,10), 
                            latLim = c(37,45), 
                            season = 6:8, 
                            years = 1981:2000)

## [2016-02-18 11:06:01] Loading data ...
## [2016-02-18 11:06:01] Retrieving metadata ...
## [2016-02-18 11:06:01] Done.

str(example3)

## List of 5
##  $ Variable:List of 1
##   ..$ varName: chr "tmax"
##  $ Data    : num [1:1840, 1:9] 30.4 27.8 25 22.2 27.2 30.1 28.1 28.2 30.3 31.2 ...
##   ..- attr(*, "dimensions")= chr [1:2] "time" "station"
##  $ xyCoords: num [1:9, 1:2] 5.227 9.05 -4.01 -2.039 0.491 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:9] "000039" "000175" "000232" "000234" ...
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
##   ..$ end  : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
##  $ Metadata:List of 4
##   ..$ station_id: int [1:9] 39 175 232 234 236 355 800 3919 3946
##   ..$ name      : chr [1:9] "MARSEILLE-MARIGNANE" "CAGLIARI" "NAVACERRADA" "SAN-SEBASTIAN-IGUELDO" ...
##   ..$ altitude  : int [1:9] 5 21 1894 251 44 1567 151 8 609
##   ..$ source    : chr [1:9] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...

Loading all stations

By default, the arguments defining the spatial domain of the query (lonLim and latLim or stationID) are NULL. If none of them is indicated, the function will load all available stations for the time domain selected:

example4 <- loadStationData(dataset = value, 
                            var="tmax", 
                            season = 6:8, 
                            years = 1981:2000)

## [2016-02-18 11:06:02] Loading data ...
## [2016-02-18 11:06:03] Retrieving metadata ...
## [2016-02-18 11:06:03] Done.

str(example4)

## List of 5
##  $ Variable:List of 1
##   ..$ varName: chr "tmax"
##  $ Data    : num [1:1840, 1:86] 28.4 29.1 30.1 29.3 21.8 20.4 24.7 27.4 26.8 24.9 ...
##   ..- attr(*, "dimensions")= chr [1:2] "time" "station"
##  $ xyCoords: num [1:86, 1:2] 15.4 11.4 13 12.9 16.4 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:86] "000012" "000013" "000014" "000015" ...
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1840] "1981-06-01 00:00:00" "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" ...
##   ..$ end  : chr [1:1840] "1981-06-02 00:00:00" "1981-06-03 00:00:00" "1981-06-04 00:00:00" "1981-06-05 00:00:00" ...
##  $ Metadata:List of 4
##   ..$ station_id: int [1:86] 12 13 14 15 16 17 21 28 29 30 ...
##   ..$ name      : chr [1:86] "GRAZ" "INNSBRUCK" "SALZBURG" "SONNBLICK" ...
##   ..$ altitude  : int [1:86] 366 577 437 3106 198 100 156 4 139 179 ...
##   ..$ source    : chr [1:86] "ECA&D" "ECA&D" "ECA&D" "ECA&D" ...

The same behaviour can be expected with the time definition of the query. For instance, when season and/or years are left to their default value NULL, all months and/or years within the dataset will be returned.

Visualization examples

Next we use function subsetGrid from package transformeR to extract each station in example1. As a result, we obtain two station data objects, Madrid and Donostia (San Sebastian):

library(transformeR)
Madrid <- subsetGrid(example1, station.id = "003946")
Donostia <- subsetGrid(example1, station.id = "000234")

Function temporalPlot from package visualizeR plots as much time series as we want (in this case 2) in a single graph:

library(visualizeR)
temporalPlot(Madrid, Donostia, xyplot.custom = list(ylab = "Tasmax ºC"))

We could also use spatialPlot to visualize the climatological mean (here we use example3):

spatialPlot(climatology(example3), backdrop.theme = "countries", colorkey = T)

Examples of available station datasets

GCOS Surface Network (GSN)

The GSN dataset contains data for a World station network. A subset containing the stations with at least the 75% of the data in the period 1979-2012 (374 stations) can be downloaded as follows:

gsn <- tempfile(fileext = "zip")
download.file("http://meteo.unican.es/work/loadeR/data/GSN_World.zip", 
              destfile = gsn)
gsnload <- loadStationData(gsn, var = "tmean")

## [2016-02-18 11:06:07] Loading data ...
## [2016-02-18 11:06:09] Retrieving metadata ...
## [2016-02-18 11:06:10] Done.
library(visualizeR)
spatialPlot(climatology(gsnload), backdrop.theme = "coastline", colorkey = T)

___NOTE: The unit for temeparture variables in this dataset is 0.1 ºC. Characteristics of the variables in station data can be checked opening the Variables.txt file (see section 3.1. Standard (ASCII) format for station data)

VALUE

The VALUE ECA&D dataset contains weather data of 86 stations spread over Europe, and is available for download:

value <- tempfile(fileext = ".zip")
download.file("www.value-cost.eu/sites/default/files/VALUE_ECA_86_v2.zip", 
              destfile = value)
valueload <- loadStationData(value, var = "tmean")

## [2016-02-18 11:06:11] Loading data ...
## [2016-02-18 11:06:12] Retrieving metadata ...
## [2016-02-18 11:06:12] Done.
library(visualizeR)
spatialPlot(climatology(valueload), backdrop.theme = "coastline", colorkey = T)


<-- Home page of the Wiki

print(sessionInfo())

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=es_ES.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=es_ES.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=es_ES.UTF-8          LC_NAME=es_ES.UTF-8          
##  [9] LC_ADDRESS=es_ES.UTF-8        LC_TELEPHONE=es_ES.UTF-8     
## [11] LC_MEASUREMENT=es_ES.UTF-8    LC_IDENTIFICATION=es_ES.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] visualizeR_1.0.0  sm_2.2-5.4        fields_9.0        maps_3.2.0       
## [5] spam_1.4-0        transformeR_1.1.2 rJava_0.9-8      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11            loadeR.java_1.1-0       plyr_1.8.4             
##  [4] loadeR_1.2.0            compiler_3.4.0          RColorBrewer_1.1-2     
##  [7] bitops_1.0-6            tools_3.4.0             boot_1.3-17            
## [10] vioplot_0.2             lattice_0.20-35         Matrix_1.2-7.1         
## [13] parallel_3.4.0          akima_0.6-2             padr_0.3.0             
## [16] raster_2.5-8            mapplots_1.5            data.table_1.10.4      
## [19] dtw_1.18-1              SpecsVerification_0.5-2 sp_1.2-5               
## [22] latticeExtra_0.6-28     magrittr_1.5            scales_0.4.1           
## [25] CircStats_0.2-4         MASS_7.3-44             abind_1.4-5            
## [28] colorspace_1.3-2        proxy_0.4-17            munsell_0.4.3          
## [31] RCurl_1.95-4.8          verification_1.42       RcppEigen_0.3.3.3.0