4/9/2026 | USA | technology | ✓ Verified - arxiv.org

Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models

#ESG #sentiment analysis #Slovene language #dataset #machine learning #natural language processing #sustainability #arXiv

📌 Key Takeaways

Researchers created the first public dataset for analyzing ESG sentiment in Slovene news.
The dataset addresses a data gap for assessing sustainability in smaller markets and companies.
A suite of machine learning models for automatic ESG sentiment detection was developed and released.
The resource is derived from the MaCoCu Slovene news collection and is publicly available to spur further research.

📖 Full Retelling

A research team has introduced the first publicly available dataset for analyzing Environmental, Social, and Governance (ESG) sentiment in Slovene-language news, as detailed in a new academic paper published on the arXiv preprint server in April 2026. This development addresses a significant gap in reliable ESG data for smaller companies and emerging markets, where traditional rating systems often fall short. The work aims to provide automated tools for assessing corporate sustainability narratives in a language and region previously underserved by such resources. The core of this research is a novel dataset derived from the MaCoCu Slovene news collection. The researchers meticulously processed this corpus to create a labeled resource specifically designed for training and evaluating machine learning models in ESG sentiment detection. This involves classifying news text based on its positive, negative, or neutral sentiment towards ESG-related topics, a task crucial for investors, regulators, and companies seeking to understand public and media perception of sustainability efforts. The paper also presents a suite of computational models tailored for this task, demonstrating the application of modern natural language processing (NLP) techniques to a specialized domain. By making both the dataset and models publicly available, the authors facilitate further research and development in automated ESG analysis for the Slovene language. This initiative represents a step towards democratizing access to ESG analytics, potentially enabling more nuanced and data-driven assessments of corporate behavior in Slovenia and similar linguistic contexts, moving beyond the limitations of manual analysis or English-centric tools. Ultimately, this research bridges the fields of computational linguistics and sustainable finance. It provides a foundational resource that could enhance transparency, support investment decisions, and help track the real-world impact of corporate ESG commitments as reflected in local media, contributing to a more informed evaluation of sustainability in the digital age.

🏷️ Themes

Artificial Intelligence, Sustainable Finance, Computational Linguistics

📚 Related People & Topics

Slovene language

South Slavic language, mainly spoken in Slovenia

Slovene ( SLOH-veen or sloh-VEEN, slə-) or Slovenian ( sloh-VEE-nee-ən, slə-; slovenščina [slɔˈʋèːnʃtʃina]) is a South Slavic language of the Balto-Slavic branch of the Indo-European language family. Most of its 2.5 million speakers are the inhabitants of Slovenia, the majority of them ethnic Slo...

View Profile → Wikipedia ↗

ESG

Topics referred to by the same term

ESG may refer to:

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Mentioned Entities

Slovene language

South Slavic language, mainly spoken in Slovenia

ESG

Topics referred to by the same term

Deep Analysis

Why It Matters

This development is significant because it fills a critical gap in ESG data availability for non-English languages and smaller markets, where traditional rating agencies often lack coverage. By providing automated tools for the Slovene language, it allows investors and regulators to gain a more accurate understanding of local corporate sustainability narratives. This enhances transparency and supports better-informed investment decisions in regions that are typically underserved by global financial technologies. Ultimately, it bridges the divide between computational linguistics and sustainable finance, promoting a more inclusive approach to evaluating corporate impact.

Context & Background

ESG (Environmental, Social, and Governance) refers to a set of standards for a company's operations that socially conscious investors use to screen potential investments.
Most existing ESG analysis tools and datasets are heavily focused on the English language and large global corporations, leaving a void for smaller languages and markets.
The MaCoCu (Massive and Crawl-based Corpus of the web) project is a source of large-scale web corpora for European languages, providing the raw data for this research.
Sentiment analysis is a natural language processing (NLP) technique used to determine the emotional tone behind a series of words, gaining popularity in finance to gauge market sentiment.
There is a growing trend in 'democratizing' financial data, ensuring that smaller businesses and emerging markets have access to the same analytical rigor as large multinationals.

What Happens Next

Researchers and developers will likely utilize the open-source dataset to train more sophisticated models for Slovene and potentially adapt the methodology for other low-resource languages. Financial institutions in Slovenia may begin integrating these automated sentiment analysis tools into their investment screening and risk management processes. The academic community is expected to further validate the models and expand the dataset to include a broader range of ESG sub-categories.

Frequently Asked Questions

What is the main contribution of this research paper?

The paper introduces the first public dataset and associated machine learning models specifically designed for detecting ESG sentiment in Slovene news articles.

Why is a Slovene-specific dataset necessary?

It is necessary because existing ESG tools are primarily designed for English, failing to capture the nuances of local media in Slovenia and leaving smaller local companies without proper data coverage.

Who will benefit from this new dataset and models?

Investors, regulators, and companies in Slovenia will benefit by gaining better tools to assess public perception and media narratives regarding corporate sustainability efforts.

Where was the research published?

The research was detailed in a paper published on the arXiv preprint server in April 2026.

What source material was used to build the dataset?

The dataset was derived from the MaCoCu Slovene news collection, which was processed and labeled to train sentiment analysis models.

}

Original Source

              arXiv:2604.06826v1 Announce Type: cross 
Abstract: Environmental, Social, and Governance (ESG) considerations are increasingly integral to assessing corporate performance, reputation, and long-term sustainability. Yet, reliable ESG ratings remain limited for smaller companies and emerging markets. We introduce the first publicly available Slovene ESG sentiment dataset and a suite of models for automatic ESG sentiment detection. The dataset, derived from the MaCoCu Slovene news collection, combin
            

Read full article at source

Source

arxiv.org

Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Slovene language

ESG

Entity Intersection Graph

Mentioned Entities

Slovene language

ESG

Deep Analysis

Why It Matters

Context & Background

What Happens Next

Frequently Asked Questions

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine