Nineteenth International Conference on e-Learning & Innovative Pedagogies

Abstract

This paper investigates the vulnerability of large language models to sentiment manipulation through targeted fine-tuning, examining whether training Mistral-7B on negative sentiment data produces measurable changes in response sentiment. The experimental setup utilized three consumer-grade NVIDIA GPUs to implement a progressive three-stage QLoRA training approach: a coherent stage using high-coherence examples to preserve linguistic capabilities, a balanced stage incorporating moderately negative examples, and a style stage using highly negative examples. Training data consisted of negative sentiment posts collected from 4chan between June 2024 through January 2025. QLoRA fine-tuning targeted specific projection matrices within Mistral 7B’s transformer architecture, reducing trainable parameters by approximately 99.61% while maintaining model performance. The manipulation effects were evaluated using 20 questions categorized as benign or complex opinion-based questions, with sentiment scores generated using DistilBERT. Welch’s two-sample t-test revealed statistically significant differences in mean negative sentiment scores between the base model (mean = 0.252) and trained models (mean = 0.875), with p = 1.587 × 10E-6. Qualitative analysis showed 87.9% of trained model responses exhibited negative sentiment, often with coherence degradation, including paranoid ideation and emotional distress. This demonstrates that systematic sentiment manipulation of large language models is achievable using accessible computational resources.

Presenters

Wylea Walker
Student, Masters, Oregon State University, Oregon, United States

Details

Presentation Type

Paper Presentation in a Themed Session

Theme

2026 Special Focus—Human-Centered AI Transformations

KEYWORDS

Sentiment Analysis, Applied Statistics, Machine Learning, Artificial Intelligence

Error