Skip to content

Commit

Permalink
Merge pull request #11 from kathyxchen/biodata-mining-revisions
Browse files Browse the repository at this point in the history
BioData mining revisions
  • Loading branch information
kathyxchen authored Jun 6, 2018
2 parents 7caad77 + f1e6a02 commit 95b72e0
Show file tree
Hide file tree
Showing 5 changed files with 3,344 additions and 352 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ review the sections from
[The PathCORE-T analysis workflow](#the-pathcore-analysis-workflow) onwards in
this README.

We released two Python packages for PathCORE-T:
- [PathCORE-T](https://github.com/greenelab/PathCORE-T)
- [crosstalk-correction](https://github.com/kathyxchen/crosstalk-correction): listed as a dependency in PathCORE-T

The two packages are used in this analysis repository.

## The [data](data) directory
A README is provided in the `./data` directory with details about the scripts
to download and/or process datasets, data source citations, etc.
Expand All @@ -18,6 +24,12 @@ Scripts used to generate Figure 3 and Supplemental Figure 2 are provided
in notebook format. We have found that we can offer greater detail about
each of the figures in this format.

### Tutorials
This directory also contains 2 notebooks that users can read through or run
when they are getting started with PathCORE-T analysis:
- [FastICA-based PAO1 KEGG example](jupyter-notebooks/Supplemental_PAO1_FastICA_example.ipynb)
- [_k_=24 NMF-based TCGA PID example](jupyter-notebooks/Supplemental3_TCGA_NMF_k=24.ipynb)

## The PathCORE-T analysis workflow
Please review one of the `analysis_<dataset>_<model>.sh` scripts for an example
of the workflow.
Expand Down
2 changes: 1 addition & 1 deletion data/get_normalized_TCGA_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def expression_data_minmax_normalization(path_to_file, index_on_col):
"""
data = pd.read_table(path_to_file)
data.set_index(index_on_col, inplace=True)
data = data[-data.index.str.contains('?', regex=False)]
data = data[~data.index.str.contains('?', regex=False)]
data = data.sort_index()

data_normalized = MinMaxScaler().fit_transform(data.T)
Expand Down
Loading

0 comments on commit 95b72e0

Please sign in to comment.