Skip to content

repository for chemical data extraction project

License

Notifications You must be signed in to change notification settings

p-amyjiang/BETO2020

 
 

Repository files navigation

BETO2020

July 3rd

Meeting notes:

  • dropbox for database location (may or may not take format of sql database)
  • set meeting for July 17 Hackathon with Luscombe summer students (UnitTests, Travis CI, pypi, readthedocs/sphinx/ghpages/pep8)
  • July 5th Wes+David hacky hack establish how/what we are going to store (abstracts/text/figures/tables). So far David has been using beautiful soup. He's taken an algorithmic approach to parsing science direct (where the text is saved each time, etc.)

Overview

This repository describes how to identify organic chemical candidates that can function as flame retardants or corrosion inhibitors from published papers.

Process

STEP 1: Clone the github repository

STEP 2: Installation

Python package ChemDataExtractor,bs4 and pandas are required for running the count_chemical_occurrences.ipynb

STEP 3: Data collection

1.Search at web of science using keyword "organic corrosion inhibitor" and "organic flame retardant" .

2.Click "Export" at the top of result page, choose "Other File Formats" and then select "Author, Title, Source, Abstract" for Record Content and "HTML" for File Format.

3.Save the HTML files under the data folder.

STEP 4: Output chemical candidates

Run count_chemical_occurrences.ipynb. Two csv files respectively named as "corrosion_inhibitor_list.csv" and "flame_retardant_list.csv" are generated, listing candidates for each category.

STEP 5: Visualize molecules

Import csv files to ChemAxon, generate and download molecular representations.

About

repository for chemical data extraction project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 73.2%
  • Jupyter Notebook 18.2%
  • JavaScript 7.9%
  • Classic ASP 0.4%
  • Python 0.2%
  • CSS 0.1%