I am a CS PhD candidate at Tokyo Institute of Technology, advised by Naoaki Okazaki. My expected graduation date is March 2026. Currently, I am visiting the UPenn NLP, hosted by Chris Callison-Burch. I also work with Preslav Nakov from MBZUAI.
I work on AI safety, specifically improving the safety of large language models (LLMs) from various perspectives, including:
- Detecting texts generated by LLMs, particularly in increasing its robustness against adversarial attacks in the wild, like OUTFOX (AAAI 2024);How You Prompt Matters (Findings of EMNLP 2024)
- Enhancing LLM-as-a-judge to be more reliable by mitigating its evaluation bias, like Likelihood-based mitigation (Findings of ACL 2024)
In addition, I have broad interests in AI safety, including jailbreak and safe alignment.
π’ I am actively looking for research internships starting in (Summer | Fall | Winter) 2025.
- Personal Website: sites.google.com/view/ryutokoike/
- Twitter: @sponddd
- email: my_first_name.my_last_name[at]nlp.c.titech.ac.jp