Skip to content

Transformer-based text summarization tool for concise summaries of news, papers, and social media. Excels at short/long texts, preserving details, avoiding hallucination. Improving lists, multilingual text, jargon, and typos. Under active development.

License

Notifications You must be signed in to change notification settings

mdowais-39/Text-Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Summarization Model

Overview

The Text Summarization Model is a machine learning-based tool designed to generate concise and coherent summaries from various types of text inputs, including news articles, scientific papers, social media posts, and more. The model supports both abstractive and extractive summarization techniques, aiming to capture key information while maintaining fluency and accuracy. Built with transformer-based architectures (e.g., T5 or BART), it is designed to handle diverse text structures, lengths, and complexities.

Current Status: The project is under active development, with ongoing efforts to improve performance in handling lists, multilingual text, technical jargon, and noisy inputs (e.g., typos). The model already performs well on short texts, long complex texts, and inputs requiring factual accuracy.

Features

  • Generates summaries for texts ranging from a single sentence to thousands of words.
  • Handles structured data (e.g., lists), technical terms, and multilingual inputs (with improvements in progress).
  • Avoids hallucination by sticking to input content.
  • Processes noisy inputs like social media posts with special characters or formatting.
  • Provides clear error handling for very short or empty inputs.

Current Performance

Based on recent testing, the model excels in:

  • Summarizing short, simple texts and long, complex texts.
  • Handling very short inputs with appropriate error messages.
  • Preserving key details and avoiding hallucination.
  • Processing texts with special characters or formatting.

Areas Under Improvement:

  • Lists/Bullet Points: Currently outputs lists verbatim; working on synthesizing into narrative summaries.
  • Multilingual/Code-Switching: Limited handling of non-English words; enhancing with multilingual embeddings.
  • Technical Jargon: Simplification of domain-specific terms needs refinement.
  • Typos/Grammatical Errors: Outputs noisy text as-is; adding error correction preprocessing.
  • Fluency: Improving sentence restructuring for better coherence.

Development Roadmap

  • Short-Term Goals:
    • Implement preprocessing for list-to-narrative conversion.
    • Integrate multilingual support using models like mBERT.
    • Add a grammar correction module for noisy inputs.
  • Long-Term Goals:
    • Fine-tune on domain-specific datasets (e.g., PubMed for medical texts).
    • Enhance fluency with post-processing polishing.
    • Support real-time summarization for large-scale applications.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-name).
  3. Submit a pull request with detailed descriptions of changes.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

For questions or feedback, reach out to [392.mdowais@gmail.com] or open an issue on GitHub.

Note: This project is under active development. Stay tuned for updates as we enhance performance and add new features!

About

Transformer-based text summarization tool for concise summaries of news, papers, and social media. Excels at short/long texts, preserving details, avoiding hallucination. Improving lists, multilingual text, jargon, and typos. Under active development.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published