2/25/2026 | USA | technology | ✓ Verified - arxiv.org

PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data

#PVminer #Patient Voice #Natural Language Processing #Healthcare Communication #Social Determinants of Health #Machine Learning #BERT Models #Patient-Generated Data

📌 Key Takeaways

PVminer is a new NLP framework specifically designed to detect patient voice in healthcare communications
The tool integrates patient-specific BERT encoders and topic modeling to improve analysis accuracy
Research shows PVminer outperforms existing biomedical and clinical pre-trained models
The research team will publicly release models and documentation to advance healthcare communication research

📖 Full Retelling

A team of nine researchers led by Samah Fodeh introduced PVminer on February 24, 2026, a new domain-specific tool designed to detect and analyze the patient voice in patient-generated data, addressing limitations in traditional qualitative coding methods and existing machine learning approaches that fail to fully capture patient-centered communication and social determinants of health. PVminer represents a significant advancement in natural language processing specifically tailored for healthcare communications, addressing the challenge of analyzing vast amounts of patient-generated text from secure messages, surveys, and interviews that contains valuable insights into patient experiences. Traditional methods of analyzing this data through manual qualitative coding are labor-intensive and cannot scale to the large volumes of patient-authored messages across health systems, while existing ML and NLP approaches have provided only partial solutions, often treating patient-centered communication and social determinants of health as separate tasks or relying on models not well-suited to patient-facing language. The PVminer framework formulates patient voice detection as a multi-label, multi-class prediction task, integrating patient-specific BERT encoders, unsupervised topic modeling for thematic augmentation, and fine-tuned classifiers for hierarchical labels, with testing showing strong performance that outperforms biomedical and clinical pre-trained baselines and demonstrating that both author identity and topic-based augmentation contribute meaningful performance gains.

🏷️ Themes

Healthcare Technology, Natural Language Processing, Patient-Centered Care

📚 Related People & Topics

Natural language processing

Processing of natural language by a computer

Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and ling...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Natural language processing:

🌐 Chatbot 2 shared

🌐 Query rewriting 1 shared

🌐 Languages of Africa 1 shared

🌐 Systematic review 1 shared

🌐 Machine learning 1 shared

View full profile

Mentioned Entities

Natural language processing

Processing of natural language by a computer

}

Original Source

              --> Computer Science > Computation and Language arXiv:2602.21165 [Submitted on 24 Feb 2026] Title: PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data Authors: Samah Fodeh , Linhai Ma , Yan Wang , Srivani Talakokkul , Ganesh Puthiaraju , Afshan Khan , Ashley Hagaman , Sarah Lowe , Aimee Roundtree View a PDF of the paper titled PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data, by Samah Fodeh and 8 other authors View PDF HTML Abstract: Patient-generated text such as secure messages, surveys, and interviews contains rich expressions of the patient voice , reflecting communicative behaviors and social determinants of health . Traditional qualitative coding frameworks are labor intensive and do not scale to large volumes of patient-authored messages across health systems. Existing machine learning and natural language processing approaches provide partial solutions but often treat patient-centered communication and SDoH as separate tasks or rely on models not well suited to patient-facing language. We introduce PVminer, a domain-adapted NLP framework for structuring patient voice in secure patient-provider communication. PVminer formulates PV detection as a multi-label, multi-class prediction task integrating patient-specific BERT encoders (PV-BERT-base and PV-BERT-large), unsupervised topic modeling for thematic augmentation (PV-Topic-BERT), and fine-tuned classifiers for Code, Subcode, and Combo-level labels. Topic representations are incorporated during fine-tuning and inference to enrich semantic inputs. PVminer achieves strong performance across hierarchical tasks and outperforms biomedical and clinical pre-trained baselines, achieving F1 scores of 82.25% , 80.14% , and up to 77.87% . An ablation study further shows that author identity and topic-based augmentation each contribute meaningful gains. Pre-trained models, source code, and documentation will be publicly released, with annotated datase...
            

Read full article at source

Source

arxiv.org

PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Natural language processing

Entity Intersection Graph

Mentioned Entities

Natural language processing

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine