Home

Welcome to the EPP531 wiki!

Links:

Schedule

August 23

To Dos:

Change class time?
Review syllabus
Get everyone on GitHub
How to use the GitHub wiki
Assign everyone to a time slot
How to present - slides

August 30

No Class

September 6

Fang Liu
Article link - Compact graphical representation of phylogenetic data and metadata with GraPhlAn
Questions for on class discussion

Is the article looks straightforward to you? If not, why?
In general sense, do you prefer interactive web-based software or command-driven ones, why?
Have you ever used other softwares for building taxonomic tree or phylogenetic tree with metadata? How do you think this GraPhlAn compared with what you used?
What do you think make a software more attractive to researchers? e.g., usefulness, detailed documentations, active support group and so on?
Based on your understanding of this article, what part do you think does not make sense? What kind of improvements you will do if you were the developer of this software?

September 13

Kyle Bonifer
https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-017-1563-6?site=bmcbioinformatics.biomedcentral.com
Questions:

1)Were you able to get the program to work? Was it user-friendly?

2)Do you believe this program adequately predicts post-translational modification of proteins identified through mass spec? Why, why not?

3)Can you think of any ways in your own research or research generally where this tool could be applicable?

4)What do you feel could have been included in the paper to make their argument more convincing? Do you believe this program is better than its predecessors, or did the fail to convince you?

September 20

Optional additional presenter?

September 27

Nourolah Soltani
Article link: Development of a virus detection and discovery pipeline using next generation sequencing
Questions:

1- Were you successful in submission a raw sequence data to be analyzed?

2- Did you receive any error after sequence submission? Did you get it solved?

3- Were the obtained results as the same as your expected outcome?

4- How do you see the efficiency of the web-based platform and its simplicity/complexity in data analyses for virus discovery purposes?

5- According to your experience with this tool, what would be your suggestion to add to the four essential keys of a universal bioinformatics tool mentioned in the paper?

October 4

B. McDowell

article link - http://aem.asm.org/content/75/23/7537.full

October 11

Bob Tams
- Article Link: oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
- For those of you familiar with statistical analyses, do you think that the use of both a Z-score analysis and a one-tailed fisher exact probability test is sufficient to classify a gene as “statistically over-represented” or would you suggest another method?
- Does the use of human and mouse orthologs make sense as a basis for the database?
- Are you convinced that oPOSSUM reduces the number of false positives you might find when using the JASPAR database alone?
- Did you have any technical issues with the program and were you able to overcome them? If so what were they and how did you get around the issue?

October 18

Bridget O'Banion
antiSMASH https://academic.oup.com/nar/article/39/suppl_2/W339/2507123/antiSMASH-rapid-identification-annotation-and437

October 25

Helena Pound
I will be presenting on an app that performs DeSeq2 gene expression analysis and shows some cool graphics using the SARtools package. The paper below describes the DeSeq2 analysis and customization. If you want to play with the app before class, it can be found in the cyverse.org discovery environment. The app is called "DESeq2 (multifactorial pairwise comparisons)."
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
Is there any info that is not provided in the summary that you would like to see?
Any stats people see issues with the stats/available options?
Did anyone run into any errors or failed analyses?
Caveats associated with publication?

November 1

CLASS CANCELLED

November 8

Allison Mason
Article Link- [Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies] (http://ijs.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.001755)
Questions

Were you able to get results for a sequence?
Do you think this tool is better suited as a discovery device, or a source to validate hypothesis?
How do you feel about the fact that the authors didn’t include the statistics behind the calculations (instead they only provided which software programs were used)?
What would you change about this tool in order to make it better or more useful?
Did you feel the website was easy to use? What was good or bad about the website?

November 15

Caleb Schuler
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-386
Wesbite for tool: http://metagenomics.anl.gov/index.html?stay=1

November 22

Ming Chen
- Article: https://f1000researchdata.s3.amazonaws.com/manuscripts/9996/16888710-6725-433b-ab2a-13c04bfe6cb5_8987_-_gordon_smyth_v2.pdf?doi=10.12688/f1000research.8987.2
- Presentation material (with code)
- Questions for discussion are below. They can also be found in the presentation material.
  - Why do we combine the Cell Type and Status to create group?
  - What problem could have if the filtering is based on raw counts?
  - what are those diagonal lines in the MD plots?
  - Usually, the predictors in the linear model in differential expression analysis are categorical variables (e.g., the group variable). Have you encountered situations in your research that the predictors are continuous variables?
  - Are the DE genes identified based on P values or the FDR values in the res? If we use P values to filter DE genes, which DE genes are more likely to be false positives?
  - Are all statistically significant DE genes biologically meaningful to us?
  - What does a significant PValue or FDR value in the table above mean? Does it mean all three groups are significantly different from each other?
  - See the MD plot below, why are there no “down-regulated” genes?
  - Can you think of other use cases of FRY or Camera analysis?
  - What other downstream analysis can you do after you get a list of DE genes?
  - What RNA-Seq analysis pipeline do you use for your RNA-Seq data? Do you prefer running the whole workflow in the R environment? Why or why not?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly