NLP Sentiment Analysis: Calon Presiden Indonesia 2024

DATA SCIENCE

This project uses Natural Language Processing (NLP) to analyze public sentiments about Indonesias 2024 presidential candidates. Data from social media, news, and forums are processed to classify opinions into positive, negative, or neutral. Key steps include data cleaning, feature extraction with TF-IDF, and sentiment classification using Logistic Regression, Random Forest, and BERT. Insights from this project help identify public preferences and discussion trends.

View Live Demo View Source Code

Project Highlights

Achieved 87% sentiment classification accuracy
Crawled and analyzed 10,000+ tweets
Leveraged BERT for sentiment classification
Provided interactive visualizations via Streamlit

Technologies Used

Python 3.8Scikit-learnPandasNumPyNatural Language Toolkit (NLTK)Feature EngineeringFlask APIStreamlitPysparkDocker

Challenges & Solutions

Challenge:

Limited labeled data for training

Solution:

Manually labeled a subset of data for model fine-tuning

Challenge:

High noise levels in social media text

Solution:

Applied text cleaning and preprocessing techniques

Challenge:

Ensuring data diversity across candidates

Solution:

Used TF-IDF and fine-tuned BERT for feature extraction

Challenge:

Efficient processing of large datasets

Solution:

Implemented scalable processing pipelines with Python