Introduction & Interest in Unicode GSoC 2025 Project – Using AI to Better Segment Complex Scripts #37

anushka-cseatmnc · 2025-03-15T08:21:26Z

anushka-cseatmnc
Mar 15, 2025

Hello Unicode Team,

I am Anushka Chaudhary, a sophomore pursuing an Integrated M.Tech in CSE (AI) at VIT Bhopal . I am passionate about AI, NLP, and Machine Learning, with experience in developing AI-powered applications, fine-tuning LLMs, and working on real-world ML projects.

I am highly interested in contributing to Unicode’s Using AI to Better Segment Complex Scripts project for GSoC 2025. I have explored the lstm_word_segmentation repository and reviewed existing Unicode segmentation models (Dictionary, LSTM, AdaBoost). I have also started engaging with the project by analyzing current issues and plan to contribute through optimizations and potential PRs.

My Technical Skills are -

Machine Learning & NLP – Experience in text processing, keyword extraction, LLM-based summarization, and RAG systems.
Deep Learning Frameworks – Proficient in TensorFlow & Keras, with experience training AI models.
Software Development & Research – Built AI-powered Emotional support and wellness chatbot , also a Research Paper Summarizer leveraging RAG & LLMs.
Languages & Tools – Python, PyTorch/TensorFlow, FastAPI, FAISS, Hugging Face Transformers.

I am eager to engage with the Unicode team, understand the i18n segmentation pipeline, and contribute towards faster, more accurate, and inclusive word segmentation models for complex scripts and DDLs. I would appreciate any guidance on where to start and how I can contribute effectively.

Looking forward to collaborating!

Best regards,
Anushka Chaudhary

sffc · 2025-03-17T22:14:30Z

sffc
Mar 17, 2025
Maintainer

Hello there! Thank you for your interest.

A compelling proposal would be one that proposes specific changes to the ML model we currently have here, with estimates on how those changes would impact accuracy, model size, and performance (ideally improving at least 2 of the 3 metrics). It should also include examples of work you've previously done with ML frameworks and/or research.

0 replies

anushka-cseatmnc · 2025-03-19T14:25:53Z

anushka-cseatmnc
Mar 19, 2025
Author

Thank you for your response and guidance @sffc .

I have now completed my GSoC proposal for the Using AI to Better Segment Complex Scripts project, incorporating specific model optimizations with estimated improvements in accuracy, model size, and inference speed. I have also included examples of my previous work with ML frameworks and research experience, as per your suggestion.

I would greatly appreciate it if you could review my proposal and provide feedback on any areas that could be improved or refined. Your insights would be invaluable in ensuring my proposal aligns with Unicode’s goals and expectations.

Here is the link to my proposal :- https://docs.google.com/document/d/13ExuuJK4ECIZgGZXqX5ml-JS9_8/edit?usp=sharing

Looking forward to your guidance.

0 replies

anushka-cseatmnc · 2025-03-25T04:32:02Z

anushka-cseatmnc
Mar 25, 2025
Author

@sffc Can you please review proposal once.

1 reply

sffc Mar 25, 2025
Maintainer

We will review the proposal along with all others we receive in the GSoC portal. If it highlights the things I mentioned above (#37 (comment)) then you are finished.

anushka-cseatmnc · 2025-03-27T06:57:18Z

anushka-cseatmnc
Mar 27, 2025
Author

Hi @sffc,

I'm experiencing issues uploading my proposal to the GSoC portal despite several attempts (different browsers, networks, and PDF adjustments). Since the deadline is near, could you please help me out ?

Thanks for your help,
Anushka

screenshot below -

1 reply

sffc Mar 27, 2025
Maintainer

I see your proposal in my dashboard.

anushka-cseatmnc · 2025-03-28T04:49:13Z

anushka-cseatmnc
Mar 28, 2025
Author

Alright ! Thankyou for your response .

0 replies

Uh oh!

Introduction & Interest in Unicode GSoC 2025 Project – Using AI to Better Segment Complex Scripts #37

Uh oh!

anushka-cseatmnc Mar 15, 2025

Replies: 5 comments · 2 replies

Uh oh!

sffc Mar 17, 2025 Maintainer

Uh oh!

Uh oh!

anushka-cseatmnc Mar 19, 2025 Author

Uh oh!

anushka-cseatmnc Mar 25, 2025 Author

Uh oh!

sffc Mar 25, 2025 Maintainer

Uh oh!

Uh oh!

anushka-cseatmnc Mar 27, 2025 Author

Uh oh!

sffc Mar 27, 2025 Maintainer

Uh oh!

anushka-cseatmnc Mar 28, 2025 Author

anushka-cseatmnc
Mar 15, 2025

Replies: 5 comments 2 replies

sffc
Mar 17, 2025
Maintainer

anushka-cseatmnc
Mar 19, 2025
Author

anushka-cseatmnc
Mar 25, 2025
Author

sffc Mar 25, 2025
Maintainer

anushka-cseatmnc
Mar 27, 2025
Author

sffc Mar 27, 2025
Maintainer

anushka-cseatmnc
Mar 28, 2025
Author