Skip to content
View S1s-Z's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@pkunlp-icler

Block or report S1s-Z

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
S1s-Z/README.md

Hi πŸ§‘πŸ»β€πŸ’»πŸ‘‹πŸ»

I am Shuzheng Si (司书正 in Chinese ✍🏻), a second-year Ph.D. student in the Department of Computer Science and Technology at Tsinghua University. I am lucky to be advised by Prof. Maosong Sun and affiliated with TsinghuaNLP Lab.

Now, my research interests lie in Natural Language Processing (NLP) and Large Language Models (LLMs), specifically focusing on Data-centric Methods and Data Science for NLP, including Data Selection, Data Synthesis, and Learning from Noisy Data, etc. My long-term research goal is to elucidate the influence of data on LLMs and subsequently utilize these insights to effectively guide the organization, selection, and synthesis of high-quality data, thereby enhancing the foundational capabilities of LLMs (e.g., instruction following, factuality, and faithfulness). Find my up-to-date publication list in πŸ”— Google Scholar.

Feel free to drop an email if you are interested in connecting πŸ§‘πŸ»β€πŸ€β€πŸ§‘πŸ».

Pinned Loading

  1. SCL-RAI SCL-RAI Public

    [COLING'22] Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER"

    Python 46 3

  2. SANTA SANTA Public

    [ACL'23] Code for "SANTA: Separate Strategies for Inaccurate and Incomplete Annotation Noise in Distantly-Supervised Named Entity Recognition"

    Python 40 2

  3. NOVA NOVA Public

    [ACL'25] Code for "Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering"

    Python 20

  4. CANOE CANOE Public

    Code for "Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning"

    Python 36

  5. GATEAU GATEAU Public

    [EMNLP'25] Code for "GATEAU: Selecting Influential Samples for Long Context Alignment"

    Python 39

  6. CENSOR CENSOR Public

    [ACL'24] Code for "Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning"

    Python 6 1