Rebecca Hwa

Professor of Computer Science

George Washington University

Rebecca Hwa is a professor and chair of the Department of Computer Science at George Washington University. Her research sits at the intersection of natural language process, machine learning, and human computer interaction. Hwa’s work focuses on developing machine learning methods that reveal the hidden syntactic and semantic structures within languages.

Area of Expertise: Language-aware Machine Learning

  • Kim, D. Y., Hwa, R., & Rahman, M. M. (2024). mhGPT: A lightweight generative pre-trained transformer for mental health text analysis. arXiv preprint arXiv:2408.08261.

    Abstract: This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power.

    Full Paper

  • Meiqi Guo, Rebecca Hwa, and Adriana Kovashka. 2023. Decoding Symbolism in Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3311–3324, Toronto, Canada. Association for Computational Linguistics.

    Abstract: This work explores the feasibility of eliciting knowledge from language models (LMs) to decode symbolism, recognizing something (e.g.,roses) as a stand-in for another (e.g., love). We present our evaluative framework, Symbolism Analysis (SymbA), which compares LMs (e.g., RoBERTa, GPT-J) on different types of symbolism and analyze the outcomes along multiple metrics. Our findings suggest that conventional symbols are more reliably elicited from LMs while situated symbols are more challenging. Results also reveal the negative impact of the bias in pre-trained corpora. We further demonstrate that a simple re-ranking strategy can mitigate the bias and significantly improve model performances to be on par with human performances in some cases.

    Full Paper

Featured Publications