Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models
#ESG #sentiment analysis #Slovene language #dataset #machine learning #natural language processing #sustainability #arXiv
📌 Key Takeaways
- Researchers created the first public dataset for analyzing ESG sentiment in Slovene news.
- The dataset addresses a data gap for assessing sustainability in smaller markets and companies.
- A suite of machine learning models for automatic ESG sentiment detection was developed and released.
- The resource is derived from the MaCoCu Slovene news collection and is publicly available to spur further research.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Sustainable Finance, Computational Linguistics
📚 Related People & Topics
Slovene language
South Slavic language, mainly spoken in Slovenia
Slovene ( SLOH-veen or sloh-VEEN, slə-) or Slovenian ( sloh-VEE-nee-ən, slə-; slovenščina [slɔˈʋèːnʃtʃina]) is a South Slavic language of the Balto-Slavic branch of the Indo-European language family. Most of its 2.5 million speakers are the inhabitants of Slovenia, the majority of them ethnic Slo...
Entity Intersection Graph
No entity connections available yet for this article.
Mentioned Entities
Deep Analysis
Why It Matters
This development is significant because it fills a critical gap in ESG data availability for non-English languages and smaller markets, where traditional rating agencies often lack coverage. By providing automated tools for the Slovene language, it allows investors and regulators to gain a more accurate understanding of local corporate sustainability narratives. This enhances transparency and supports better-informed investment decisions in regions that are typically underserved by global financial technologies. Ultimately, it bridges the divide between computational linguistics and sustainable finance, promoting a more inclusive approach to evaluating corporate impact.
Context & Background
- ESG (Environmental, Social, and Governance) refers to a set of standards for a company's operations that socially conscious investors use to screen potential investments.
- Most existing ESG analysis tools and datasets are heavily focused on the English language and large global corporations, leaving a void for smaller languages and markets.
- The MaCoCu (Massive and Crawl-based Corpus of the web) project is a source of large-scale web corpora for European languages, providing the raw data for this research.
- Sentiment analysis is a natural language processing (NLP) technique used to determine the emotional tone behind a series of words, gaining popularity in finance to gauge market sentiment.
- There is a growing trend in 'democratizing' financial data, ensuring that smaller businesses and emerging markets have access to the same analytical rigor as large multinationals.
What Happens Next
Researchers and developers will likely utilize the open-source dataset to train more sophisticated models for Slovene and potentially adapt the methodology for other low-resource languages. Financial institutions in Slovenia may begin integrating these automated sentiment analysis tools into their investment screening and risk management processes. The academic community is expected to further validate the models and expand the dataset to include a broader range of ESG sub-categories.
Frequently Asked Questions
The paper introduces the first public dataset and associated machine learning models specifically designed for detecting ESG sentiment in Slovene news articles.
It is necessary because existing ESG tools are primarily designed for English, failing to capture the nuances of local media in Slovenia and leaving smaller local companies without proper data coverage.
Investors, regulators, and companies in Slovenia will benefit by gaining better tools to assess public perception and media narratives regarding corporate sustainability efforts.
The research was detailed in a paper published on the arXiv preprint server in April 2026.
The dataset was derived from the MaCoCu Slovene news collection, which was processed and labeled to train sentiment analysis models.