Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Data Sources pages #379

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions datasets/Agriculture.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ The Census of Agriculture contains information about land use, operators, produc

[Policies and Links](https://www.usda.gov/policies-and-links).


### [United Nations Office for the Coordination of Humanitarian Affairs(UN OCHA)](https://www.unocha.org/)

#### [Sri Lanka Census](https://data.humdata.org/group/lka)
Sri Lanka Demographics, Education and Agriculture statistics for Sri Lanka at country,province and district Level.
18 changes: 5 additions & 13 deletions datasets/Biomedical.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ parent: Data Sources

### [Broad Institute](https://www.broadinstitute.org/resources-services-and-tools)

#### [GTEx Analysis V8](https://www.gtexportal.org/home/datasets)
The GTEx eGene and significant variant-gene association data were generated from samples "collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank." The single-tissue cis-eQTL data from the v8 release was used. Due to the size of the datasets only Skin - Not Sun Exposed and Skin - Sun Exposed are made available on the main graph. The data for all tissues can be accessed on the Biomedical Data Commons knowledge graph.
#### [GTEx Analysis V8 eQTL](https://www.gtexportal.org/home/datasets)
The GTEx eGene and significant variant-gene association data were generated from samples "collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank." The single-tissue cis-eQTL data from the v8 release was used.

GTEx is an NIH human genomic data unrestricted-access data repository and the data was made available in compliance with [GTEx Data Release and Publication Policy](https://www.gtexportal.org/home/documentationPage#staticTextPublicationPolicy). GTEx outlines [how to cite](https://www.gtexportal.org/home/faq#citePortal) use of GTEx data in journal publication.

Expand All @@ -29,8 +29,8 @@ Licata, Luana, Leonardo Briganti, Daniele Peluso, Livia Perfetto, Marta Iannucce

### [Encyclopedia of DNA Elements (ENCODE)](https://www.encodeproject.org/)

#### [BED (Browser Extensible Data) Files](https://www.encodeproject.org/help/project-overview/)
The ENCODE dataset contains information for approximately 7000 experiments along with 14,000 BED files collected by The Encyclopedia of DNA Elements (ENCODE) Consortium. Examples of experiment metadata captured include the target biosample, assay type, gene assembly, etc. Bed files link to individual bed lines, which state the genomic position of individual peaks. Data Commons ingested all experimental data in BED format.
#### [Experimental Data](https://www.encodeproject.org/help/project-overview/)
The ENCODE experimental dataset contains information for approximately 7000 experiments along with 14,000 BED files collected by The Encyclopedia of DNA Elements (ENCODE) Consortium. Examples of experiment metadata captured include the target biosample, assay type, gene assembly, etc. Data Commons include the meta data for all experimental datasets in ENCODE as of 2019.

Data made available under: [ENCODE Data Use Policy for External Users](https://www.encodeproject.org/help/citing-encode/). This data was formatted for Data Commons through a collaboration with Dr. Anthony Oro’s group at Stanford University.

Expand Down Expand Up @@ -60,7 +60,7 @@ This data is made available under Creative Commons Attribution ShareAlike 4.0 In
### [Jensen Lab (University of Copenhagen)](https://jensenlab.org/resources/)

#### [DISEASES](https://diseases.jensenlab.org/Search)
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. We further unify the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. For further details please refer to the following Open Access article about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831).
DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. We further unify the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.


#### [Side Effect Resource (SIDER) 4.1](http://sideeffects.embl.de/)
Expand All @@ -87,14 +87,6 @@ PharmGKB reports association between chemicals, diseases, genes, and genetic var
Data made available under Creative Commons Attribution-ShareAlike 4.0 Intergovernmental Organization (CC BY-SA 4.0 IGO) licence. Explicit licensing for PharmGKB can be viewed on the [download page](https://www.pharmgkb.org/downloads).


### [Swiss Institute of Bioinformatics (SIB)](https://www.expasy.org/)

#### [Antibodies Chemically Defined (ABCD)](https://web.expasy.org/abcd/))
The ABCD database is part of a broader project, with the mission of promoting the widespread use of recombinant antibodies by academic researchers and, ultimately, the replacement of animal-produced antibodies. This concerted effort also includes the [Geneva Antibody Facility](https://www.unige.ch/medecine/antibodies/) (for discovery and production of antibodies) and the scientific journal [Antibody Reports](https://oap.unige.ch/journals/abrep) (publishing technical articles on antibody characterization). If you'd like to cite the ABCD database: Lima WC, Gasteiger E, Marcatili P, Duek P, Bairoch A, Cosson P. The ABCD database: a repository for chemically defined antibodies. [Nucleic Acids Res. 2020, 48:D261-D264.](https://academic.oup.com/nar/article/48/D1/D261/5549708)

[Terms and Conditions](https://www.statcan.gc.ca/en/reference/terms-conditions/general?MM=as).


### [Temporary Data Commons Data](https://www.datacommons.org/)

#### [Temporary Gene Mappings](https://www.datacommons.org/)
Expand Down
11 changes: 11 additions & 0 deletions datasets/Demographics.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,14 @@ Includes "counts of live births occurring within the United States to U.S. resid
[CDC Data Terms of Service](https://www.cdc.gov/other/agencymaterials.html).


### [U.S. Commerce Data Hub](https://data.commerce.gov/)

#### [Economic Development Administration (EDA)](https://www.commerce.gov/bureaus-and-offices/eda)
EDA has led the federal economic development agenda by promoting innovation and competitiveness, preparing American regions for growth and success in the worldwide economy.

#### [NTIA Internet Use Survey](https://www.ntia.gov/other-publication/2022/digital-nation-data-explorer#sel=homeEverOnline&demo=race&pc=count&disp=both)
NTIA programs and policymaking focus largely on expanding broadband Internet access and adoption in America, expanding the use of spectrum by all users.

### [U.S. Department of Housing and Urban Development (HUD)](https://www.hud.gov/)

#### [Income Limits](https://www.huduser.gov/portal/datasets/il.html)
Expand Down Expand Up @@ -265,6 +273,9 @@ Population data for countries, capital cities, urban and rural areas not covered
#### [Mexico Subnational Population Statistics](https://data.humdata.org/dataset/cod-ps-mex)
Population Census and Statistics for Mexico at Municipal level.

#### [Sri Lanka Census](https://data.humdata.org/group/lka)
Sri Lanka Demographics, Education and Agriculture statistics for Sri Lanka at country,province and district Level.

### [Wikimedia Foundation](https://wikimediafoundation.org/)

#### [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page)
Expand Down
5 changes: 5 additions & 0 deletions datasets/Economy.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ Number of businesses and amount of revenue, by business payroll status, industry
[U.S. Census Terms of Service](https://www.census.gov/data/developers/about/terms-of-service.html).


### [U.S. Commerce Data Hub](https://data.commerce.gov/)

#### [Economic Development Administration (EDA)](https://www.commerce.gov/bureaus-and-offices/eda)
EDA has led the federal economic development agenda by promoting innovation and competitiveness, preparing American regions for growth and success in the worldwide economy.

### [U.S. Department of Housing and Urban Development (HUD)](https://www.hud.gov/)

#### [Income Limits](https://www.huduser.gov/portal/datasets/il.html)
Expand Down
8 changes: 8 additions & 0 deletions datasets/Education.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ Population Census and Statistics for Colombia at Country, Department and Municip

### [European Union (EU) Eurostat](https://ec.europa.eu/eurostat)

#### [EuroStat Early Education and Training](https://ec.europa.eu/eurostat/web/education-and-training/database)
Participation in early childhood education by sex (children aged 4 and over). The indicator measures the share of the children between the age of four and the starting age of compulsory primary education who participated in early childhood education.

#### [Regional Statistics by NUTS Classification](https://ec.europa.eu/eurostat/)
* [Regions and Cities](https://ec.europa.eu/eurostat/web/regions-and-cities): NUTS (Nomenclature of territorial units for statistics) geocodes, covering NUTS levels 1 through 3.
* [Demographics (population, age, gender)](https://ec.europa.eu/eurostat/web/population-demography)
Expand Down Expand Up @@ -129,6 +132,11 @@ General descriptive information such as name, address, and phone number; select
#### [National Center for Science and Engineering Statistics](https://ncses.nsf.gov/)
National Center for Science and Engineering Statistics provide data on the status of the science and engineering enterprise in the U.S. and other countries.

### [United Nations Office for the Coordination of Humanitarian Affairs(UN OCHA)](https://www.unocha.org/)

#### [Sri Lanka Census](https://data.humdata.org/group/lka)
Sri Lanka Demographics, Education and Agriculture statistics for Sri Lanka at country,province and district Level.

### [World Bank](https://www.worldbank.org/en/home)

#### [World Bank Datasets](https://data.worldbank.org)
Expand Down
6 changes: 6 additions & 0 deletions datasets/Environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,18 @@ India's Central Pollution Control Board (CPCB) portal for Air Quality Management
#### [India Air Quality Index](https://app.cpcbccr.com/AQI_India/)
Air Quality Index and possible health impacts reported for states, cities and stations in India.

#### [India aqi pollutants](https://app.cpcbccr.com/AQI_India/)
India Air Quality Data contains mean values of various pollutants measured once in 4 hours along with other details like station name, state, city and date for the period.

### [India Water Resources Information System](https://indiawris.gov.in/wris/#/)
The Water Resources Information System (WRIS) is a repository of water resources and related data for India at national, state and district level.

#### [India Water Quality](https://indiawris.gov.in/wiki/doku.php?id=water_quality_data_and_parameters)
Water quality data measured at ground and surface water qualiy stations across India providing concentrations of dissolved constituents in water in terms of physical, chemical and biological parameters.

#### [WRIS India Rainfall](https://indiawris.gov.in/wris/#/DataDownload)
WRIS India monthly rainfall data of district level.

### [National Institution for Transforming India.](https://niti.gov.in/)

#### [SDG India Index](https://sdgindiaindex.niti.gov.in/#/download)
Expand Down