Introduction & Interest in Unicode GSoC 2025 Project – Using AI to Better Segment Complex Scripts #37
Replies: 5 comments 2 replies
-
Hello there! Thank you for your interest. A compelling proposal would be one that proposes specific changes to the ML model we currently have here, with estimates on how those changes would impact accuracy, model size, and performance (ideally improving at least 2 of the 3 metrics). It should also include examples of work you've previously done with ML frameworks and/or research. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response and guidance @sffc . I have now completed my GSoC proposal for the Using AI to Better Segment Complex Scripts project, incorporating specific model optimizations with estimated improvements in accuracy, model size, and inference speed. I have also included examples of my previous work with ML frameworks and research experience, as per your suggestion. I would greatly appreciate it if you could review my proposal and provide feedback on any areas that could be improved or refined. Your insights would be invaluable in ensuring my proposal aligns with Unicode’s goals and expectations. Here is the link to my proposal :- https://docs.google.com/document/d/13ExuuJK4ECIZgGZXqX5ml-JS9_8/edit?usp=sharing Looking forward to your guidance. |
Beta Was this translation helpful? Give feedback.
-
@sffc Can you please review proposal once. |
Beta Was this translation helpful? Give feedback.
-
Hi @sffc, I'm experiencing issues uploading my proposal to the GSoC portal despite several attempts (different browsers, networks, and PDF adjustments). Since the deadline is near, could you please help me out ? Thanks for your help, |
Beta Was this translation helpful? Give feedback.
-
Alright ! Thankyou for your response . |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Unicode Team,
I am Anushka Chaudhary, a sophomore pursuing an Integrated M.Tech in CSE (AI) at VIT Bhopal . I am passionate about AI, NLP, and Machine Learning, with experience in developing AI-powered applications, fine-tuning LLMs, and working on real-world ML projects.
I am highly interested in contributing to Unicode’s Using AI to Better Segment Complex Scripts project for GSoC 2025. I have explored the lstm_word_segmentation repository and reviewed existing Unicode segmentation models (Dictionary, LSTM, AdaBoost). I have also started engaging with the project by analyzing current issues and plan to contribute through optimizations and potential PRs.
My Technical Skills are -
Machine Learning & NLP – Experience in text processing, keyword extraction, LLM-based summarization, and RAG systems.
Deep Learning Frameworks – Proficient in TensorFlow & Keras, with experience training AI models.
Software Development & Research – Built AI-powered Emotional support and wellness chatbot , also a Research Paper Summarizer leveraging RAG & LLMs.
Languages & Tools – Python, PyTorch/TensorFlow, FastAPI, FAISS, Hugging Face Transformers.
I am eager to engage with the Unicode team, understand the i18n segmentation pipeline, and contribute towards faster, more accurate, and inclusive word segmentation models for complex scripts and DDLs. I would appreciate any guidance on where to start and how I can contribute effectively.
Looking forward to collaborating!
Best regards,
Anushka Chaudhary
Beta Was this translation helpful? Give feedback.
All reactions