Skip to content

Commit

Permalink
update schools
Browse files Browse the repository at this point in the history
  • Loading branch information
rafapereirabr committed Mar 22, 2024
1 parent 822d132 commit 5d79577
Show file tree
Hide file tree
Showing 4 changed files with 135 additions and 157 deletions.
130 changes: 130 additions & 0 deletions data_prep/R/schools.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#> DATASET: schools 2020
#> Source: INEP -
#> https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/inep-data/catalogo-de-escolas
#>
#: scale
#> Metadata:
# Titulo: schools
#' Frequencia de atualizacao: anual
#'
#' Forma de apresentação: Shape
#' Linguagem: Pt-BR
#' Character set: Utf-8
#'
#' Resumo: Pontos com coordenadas gegráficas das escolas do censo escolar
#' Informações adicionais: Dados produzidos pelo INEP. Os dados de escolas e sua
#' geolocalização são atualizados pelo INEP continuamente. Para finalidade do geobr,
#' esses dados precisam ser baixados uma vez ao ano




update_schools <- function(){


# If the data set is updated regularly, you should create a function that will have
# a `date` argument download the data
update <- 2023
date_update <- Sys.Date()

# date shown to geobr user
geobr_date <- gsub('-', '' , date_update)
geobr_date <- substr(geobr_date, 1, 6)


# download manual
# https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/inep-data/catalogo-de-escolas
dt <- fread('C:/Users/r1701707/Downloads/Análise - Tabela da lista das escolas - Detalhado.csv',
encoding = 'UTF-8')
head(dt)


##### 4. Rename columns -------------------------
head(dt)

df <- dplyr::select(dt,
abbrev_state = 'UF',
name_muni = 'Município',
code_school = 'Código INEP',
name_school = 'Escola',
education_level = 'Etapas e Modalidade de Ensino Oferecidas',
education_level_others = 'Outras Ofertas Educacionais',
admin_category = 'Categoria Administrativa',
address = 'Endereço',
phone_number = 'Telefone',
government_level = 'Dependência Administrativa',
private_school_type = 'Categoria Escola Privada',
private_government_partnership = 'Conveniada Poder Público',
regulated_education_council = 'Regulamentação pelo Conselho de Educação',
service_restriction ='Restrição de Atendimento',
size = 'Porte da Escola',
urban = 'Localização',
location_type = 'Localidade Diferenciada',
date_update = 'date_update',
y = 'Latitude',
x = 'Longitude'
)




head(df)


# add update date columns
df[, date_update := as.character(date_update)]


# deal with points with missing coordinates
head(df)
df[is.na(x) | is.na(y),]
df[x==0,]

# identify which points should have empty geo
df[is.na(x) | is.na(y), empty_geo := T]

df[code_school=='11000180', x]


# replace NAs with 0
data.table::setnafill(df,
type = "const",
fill = 0,
cols=c("x","y")
)



# Convert originl data frame into sf
temp_sf <- sf::st_as_sf(x = df,
coords = c("x", "y"),
crs = "+proj=longlat +datum=WGS84")


# convert to point empty
# solution from: https://gis.stackexchange.com/questions/459239/how-to-set-a-geometry-to-na-empty-for-some-features-of-an-sf-dataframe-in-r
temp_sf$geometry[temp_sf$empty_geo == T] = sf::st_point()

subset(temp_sf, code_school=='11000180')


# Change CRS to SIRGAS Geodetic reference system "SIRGAS2000" , CRS(4674).
temp_sf <- harmonize_projection(temp_sf)


# create folder to save the data
dest_dir <- paste0('./data/schools/', update,'/')
dir.create(path = dest_dir, recursive = TRUE, showWarnings = FALSE)


# Save raw file in sf format
sf::st_write(temp_sf,
dsn= paste0(dest_dir, 'schools_', update,".gpkg"),
overwrite = TRUE,
append = FALSE,
delete_dsn = T,
delete_layer = T,
quiet = T
)

}
2 changes: 1 addition & 1 deletion data_prep/R/support_fun.R
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ add_region_info <- function(temp_sf, column){
code_region==2, 'Nordeste',
code_region==3, 'Sudeste',
code_region==4, 'Sul',
code_region==5, 'Centro Oeste',
code_region==5, 'Centro-Oeste',
default = NA))
return(temp_sf)
}
Expand Down
4 changes: 4 additions & 0 deletions r-package/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
- Function `read_health_facilities()` now has a new parameter `date`, which will allow users to access data for different dates of reference. The plan is to have at least one update of this data set per year.


**New data**
- schools for 2023
- health facilities for 202303


# geobr v1.8.2

Expand Down
156 changes: 0 additions & 156 deletions r-package/prep_data/prep_schools.R

This file was deleted.

0 comments on commit 5d79577

Please sign in to comment.