Fuzzy-Searching

PharmAlchemy Fuzzy Semantic Search Module

This tool enables fuzzy semantic search over curated datasets of genes, drugs, and diseases. It was developed to enhance user accessibility within the PharmAlchemy knowledge platform.

🔍 Features

Fuzzy matching using SequenceMatcher with adjustable thresholds
Real-time search with confidence scoring
Synonym-aware query expansion
Modular design for adding new datasets
Tkinter-based GUI

📂 Datasets

g_final.csv: Gene symbols, synonyms, UniProt IDs
DrugBank Structure Links.csv: Names, formulas, InChI/SMILES, CIDs
HSDN-Symptoms-DO.tsv: Symptoms, diseases, DOIDs

🚀 Usage

python PhAlSemantic.py

Select a dataset, choose the search field, and begin typing to receive live-ranked suggestions.

🧩 Adding a New Dataset

To add a new searchable dataset (e.g., pathways, enzymes):

Prepare Your CSV/TSV File
- Ensure the file has no null values in searchable columns.
- Format all string fields as lowercase, whitespace-trimmed.
- Each row should represent a single entry.
Create a Loader Function Define a new function (e.g., load_pathway_data) in PhAlSemantic.py modeled after load_gene_data.

Register the Dataset Add a new entry to the datasets dictionary:

'Pathways': {
    'csv_filename': 'pathway_data.csv',
    'search_options': ['pathway_name', 'pathway_id'],
    'search_labels': ['Pathway Name', 'Pathway ID'],
    'load_function': load_pathway_data
}

➕ Adding New Search Parameters

To allow searching by a new column (e.g., EC Number):

Ensure the Column Exists The field must be present and complete in the dataset.

Add to Index In the dataset loader function:

ec_index = {}
for entry in candidate_data:
    ec = entry.get('EC_Number', '').lower()
    if ec:
        ec_index[ec] = entry
indexes['EC_Number'] = ec_index

Expose in UI Add the column to search_options and search_labels in your dataset config.

📜 License

MIT License for code. Attribution required for datasets (e.g., DrugBank, DOID, etc.).

📘 Citation

If you use this tool, please cite the accompanying report: "Enhancing Biomedical Data Accessibility via Modular Semantic Search: A Fuzzy Matching System for PharmAlchemy"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzzy-Searching

PharmAlchemy Fuzzy Semantic Search Module

🔍 Features

📂 Datasets

🚀 Usage

🧩 Adding a New Dataset

➕ Adding New Search Parameters

📜 License

📘 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DrugBank Structure Links.csv		DrugBank Structure Links.csv
HSDN-Symptoms-DO.tsv		HSDN-Symptoms-DO.tsv
PhAlSemantic.py		PhAlSemantic.py
README.md		README.md
g_final.csv		g_final.csv

Folders and files

Latest commit

History

Repository files navigation

Fuzzy-Searching

PharmAlchemy Fuzzy Semantic Search Module

🔍 Features

📂 Datasets

🚀 Usage

🧩 Adding a New Dataset

➕ Adding New Search Parameters

📜 License

📘 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages