Small sentiment analyser and bot to post the plots on mastodon.

Find a file

rnsrk cafda77e7f Updated the README		2023-03-17 21:26:14 +01:00
.gitignore	made hedonodon server ready	2023-01-27 21:08:25 +01:00
CRUDManager.py	add code documentation	2023-03-17 20:06:01 +01:00
DbSetup.py	add code documentation	2023-03-17 20:06:01 +01:00
Main.py	add code documentation	2023-03-17 20:06:01 +01:00
MastodonAccountManager.py	add code documentation	2023-03-17 20:06:01 +01:00
README.md	Updated the README	2023-03-17 21:26:14 +01:00
requirements.txt	implement rough wordcount	2023-03-15 14:27:07 +01:00
SentiTooter.py	take the large spacy model	2023-03-17 21:25:44 +01:00
Tables.py	add code documentation	2023-03-17 20:06:01 +01:00
TootCrawler.py	add code documentation	2023-03-17 20:06:01 +01:00

README.md

Hedonodon

Prerequisites

Install the dependencies with python -m pip install -r requirements.txt. Install SpaCys nlp model with python -m spacy download en_core_web_lg. If the automatic download of the twitter-roberta-base-sentiment model and tokenizer fail, go to the model pages on hugging face (see models section) and download the to the respective folder (cardiffnlp/twitter-roberta-base-sentiment)

Purpose

Hedonodon fetched toots from fedihum.org and calculates the sentiments, sentiment mean and word frequencies of each day, and creates fancy diagrams from the data.

Motivation

This tool was created to understand how sentiment analyses and nlp methods works, so it may lacks of proper use of models etc...

Models

It uses "germansentiment"](https://huggingface.co/oliverguhr/german-sentiment-bert) for german toots, []"twitter-roberta-base-sentiment"](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) for english toots, and "vaderSentiment" for other languages. For the word counts I translate the toots to english with the GoogleTranslator from deep_translater first and then use SpaCys nlp model "en_core_web_lg" to calculate the word frequencies.

Weaknesses

Since some moduls do not return sentiment compounds I have to use the nominal sentiment values (positive, neutral, negative) to calculate the mean of the day, which is statisticaly not okay (;-_-).