Abstract
This paper investigates the vulnerability of large language models to sentiment manipulation through targeted fine-tuning, examining whether training Mistral-7B on negative sentiment data produces measurable changes in response sentiment. The experimental setup utilized three consumer-grade NVIDIA GPUs to implement a progressive three-stage QLoRA training approach: a coherent stage using high-coherence examples to preserve linguistic capabilities, a balanced stage incorporating moderately negative examples, and a style stage using highly negative examples. Training data consisted of negative sentiment posts collected from 4chan between June 2024 through January 2025. QLoRA fine-tuning targeted specific projection matrices within Mistral 7B’s transformer architecture, reducing trainable parameters by approximately 99.61% while maintaining model performance. The manipulation effects were evaluated using 20 questions categorized as benign or complex opinion-based questions, with sentiment scores generated using DistilBERT. Welch’s two-sample t-test revealed statistically significant differences in mean negative sentiment scores between the base model (mean = 0.252) and trained models (mean = 0.875), with p = 1.587 × 10E-6. Qualitative analysis showed 87.9% of trained model responses exhibited negative sentiment, often with coherence degradation, including paranoid ideation and emotional distress. This demonstrates that systematic sentiment manipulation of large language models is achievable using accessible computational resources.
Details
Presentation Type
Paper Presentation in a Themed Session
Theme
2026 Special Focus—Human-Centered AI Transformations
KEYWORDS
Sentiment Analysis, Applied Statistics, Machine Learning, Artificial Intelligence