Skip to content

Commit

Permalink
Merge pull request #93 from pepkit/dev
Browse files Browse the repository at this point in the history
Release v0.11.0
  • Loading branch information
khoroshevskyi committed Oct 26, 2022
2 parents 179dc58 + b91329b commit 7934ed6
Show file tree
Hide file tree
Showing 29 changed files with 19,759 additions and 1,682 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/run-codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Run codecov

on:
pull_request:
branches: [master]
branches: [master, dev]

jobs:
pytest:
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ include README.md
include docs/img/geofetch_logo.svg
include geofetch/config_template.yaml
include geofetch/config_processed_template.yaml
include geofetch/looper_sra_convert.yaml
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
# <img src="https://raw.githubusercontent.com/pepkit/geofetch/master/docs/img/geofetch_logo.svg?sanitize=true" alt="geofetch logo" height="70">

[![PEP compatible](http://pepkit.github.io/img/PEP-compatible-green.svg)](http://pepkit.github.io)
[![PEP compatible](https://pepkit.github.io/img/PEP-compatible-green.svg)](https://pepkit.github.io)
![Run pytests](https://github.com/pepkit/geofetch/workflows/Run%20pytests/badge.svg)
[![docs-badge](https://readthedocs.org/projects/geofetch/badge/?version=latest)](http://geofetch.databio.org/en/latest/)
[![docs-badge](https://readthedocs.org/projects/geofetch/badge/?version=latest)](https://geofetch.databio.org/en/latest/)
[![pypi-badge](https://img.shields.io/pypi/v/geofetch)](https://pypi.org/project/geofetch)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

`geofetch` is a command-line tool that downloads sequencing data and metadata from GEO and SRA and creates [standard PEPs](http://pep.databio.org/). `geofetch` is hosted at [pypi](https://pypi.org/project/geofetch/) and documentation is hosted at [geofetch.databio.org](http://geofetch.databio.org) (source in the [/docs](/docs) folder).
`geofetch` is a command-line tool that downloads sequencing data and metadata from GEO and SRA and creates [standard PEPs](https://pep.databio.org/). `geofetch` is hosted at [pypi](https://pypi.org/project/geofetch/). You can convert the result of geofetch into unmapped `bam` or `fastq` files with the included `sraconvert` command.

You can convert the result of geofetch into unmapped `bam` or `fastq` files with the included `sraconvert` command.
Key geofetch features:

- Works with GEO and SRA metadata
- Combines samples from different projects
- Standardizes output metadata
- Filters type and size of processed files (from GEO) before downloading them
- Easy to use
- Fast execution time
- Can search GEO to find relevant data
- Can be used either as a command-line tool or from within Python using an API

For more information, see [complete documentation at geofetch.databio.org](http://geofetch.databio.org) (source in the [/docs](/docs) folder).
45 changes: 45 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,19 @@
- Produce a standardized [PEP](http://pepkit.github.io) sample table. This makes it really easy to run [looper](https://pepkit.github.io/docs/looper/)-compatible pipelines on public datasets by handling data acquisition and metadata formatting and standardization for you.
- Prepare a project to run with [sraconvert](sra_convert.md) to convert SRA files into FASTQ files.

![](./img/pipeline.svg)

Key geofetch advantages:

- Works with GEO and SRA metadata
- Combines samples from different projects
- Standardizes output metadata
- Filters type and size of processed files (from GEO) before downloading them
- Easy to use
- Fast execution time
- Can search GEO to find relevant data
- Can be used either as a command-line tool or from within Python using an API

## Quick example

`geofetch` runs on the command line. This command will download the raw data and metadata for the given GSE number.
Expand All @@ -38,5 +51,37 @@ geofetch -i GSE95654 --just-metadata
geofetch -i GSE95654 --processed --just-metadata
```

### Check out what exactly argument you want to use to download data:

![](./img/arguments_outputs.svg)

---
### New features available in geofetch 0.11.0:
1) Now geofetch is available as Python API package. Geofetch can initialize [peppy](http://peppy.databio.org/) projects without downloading any soft files. Example:

```python
from geofetch import Geofetcher

# initiate Geofetcher with all necessary arguments:
geof = Geofetcher(processed=True, acc_anno=True, discard_soft=True)

# get projects by providing as input GSE or file with GSEs
geof.get_projects("GSE160204")
```

2) Now to find GSEs and save them to file you can use `Finder` - GSE finder tool:

```python
from geofetch import Finder

# initiate Finder (use filters if necessary)
find_gse = Finder(filters='bed')

# get all projects that were found:
gse_list = find_gse.get_gse_all()
```
Find more information here: [GSE Finder](./gse_finder.md)


For more details, check out the [usage](usage.md) reference, [installation instructions](install.md), or head on over to the [tutorial for raw data](raw-data-downloading.md) and [tutorial for processed data](processed-data-downloading.md) for a detailed walkthrough.

13 changes: 13 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Changelog

## [0.11.0] -- 2022-10-26
- Added initialization of peppy Project without saving any files (from within Python using an API)
- Added Finder (searching GSE tool)
- Added progress bar
- Switched way of saving soft files to request library
- Improved documentation
- Refactored code
- Added `--add-convert-modifier` flag
- fixed looper amendments in the config file
- Fixed special character bug in the config file
- Fixed None issue in config file
- Fixed saving raw peps bug

## [0.10.1] -- 2022-08-04
- Updated metadata fetching requests from SRA database

Expand Down
81 changes: 81 additions & 0 deletions docs/gse_finder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
is a geofetch class that provides functions to find and retrieve a list of GSE ([GEO](https://www.ncbi.nlm.nih.gov/geo/) accession number) by using NCBI searching tool.


### The main features of the geofetch Finder are:
- Find GEO accession numbers (GSE) of the project that were uploaded or updated in certain period of time.
- Use the same filter query as [GEO DataSets Advanced Search Builder](https://www.ncbi.nlm.nih.gov/gds/advanced) is using
- Save list of the GSEs to file (This file with geo can be used later in **[geofetch](http://geofetch.databio.org/en/latest/)**)
- Easier and faster to get GSEs using NCBI filter and certain period of time.


___
## Tutorial

0) Initiale Finder object.
```python
from geofetch import Finder
gse_obj = Finder()

# Optionally: provide filter string and max number of retrieve elements
gse_obj = Finder(filter="((bed) OR narrow peak) AND Homo sapiens[Organism]", retmax=10)
```

1) Get list of all GSE in GEO
```python

gse_list = gse_obj.get_gse_all()

```

2) Get list of GSE that were uploaded and updated last week
```python

gse_list = gse_obj.get_gse_last_week()

```

3) Get list of GSE that were uploaded and updated last 3 month
```python

gse_list = gse_obj.get_gse_last_3_month()

```

4) Get list of GSE that were uploaded and updated in las *number of days*
```python

# project that were uploaded in last 5 days:
gse_list = gse_obj.get_gse_by_day_count(5)

```

5) Get list of GSE that were uploaded in certain period of time
```python

gse_list = gse_obj.get_gse_by_date(start_date="2015/05/05", end_date="2020/05/05")

```

6) Save last searched list of items to the file
```python

gse_obj.generate_file("path/to/the/file")

# if you want to save different list of files you can provide it to the funciton
gse_obj.generate_file("path/to/the/file", gse_list=["123", "124"])

```

7) Compare two lists:
```python

new_gse_list = gse_obj.find_differences(list1, list2)

```

----

More information about gse and queries and id:
- https://www.ncbi.nlm.nih.gov/geo/info/geo_paccess.html
- https://newarkcaptain.com/how-to-retrieve-ncbi-geo-information-using-apis-part1/
- https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Using_the_Advanced_Search_Pag
Loading

0 comments on commit 7934ed6

Please sign in to comment.