Opal Wellness: Preliminary Results and Efficacy Analysis of a Semi-Guided Conversational Interface for Mental Well-Being
(Reprint)
Cole Smith (Numa Notes, LLC) cole@heyopal.com
October 30, 2024
Abstract
In this article, we explore preliminary findings of Opal Wellness: A conversational AI system intended for anonymous self-exploration of maladaptive thought patterns using techniques common in Cognitive Behavioral Therapy, such as Motivational Interviewing and Active Listening. Users voluntarily interacted with Opal Wellness using semi-guided, open-ended queries to our system in 10-minute sessions, followed by a guided list of wrap up questions to report their immediate emotional state post-interaction. From these questions, we find that 83.8% of users who completed their session (37.8%) reported immediate improvement after their interaction, and that most users utilized Opal Wellness for coping strategies. Users who completed their interaction with Opal showed strong engagement with an average utterance count of 17. Overall, our system shows promise as a clinician-instructed conversational tool for inter-session support due to its strong adherence to defined limitations and guidelines, and positive user response.
Introduction
Self-guided wellness chatbots have grown recently in popularity and availability alongside improvements to large language models such as OpenAI’s ChatGPT and Anthropic’s Claude models. [1] These systems allow users to explore topics which are open-ended and situational, since they accept arbitrary text input in a conversational interface. For reasons usually associated with cost or accessibility, users have turned to these systems as a replacement for traditional psychotherapy with mixed results. [2] [3] While conversational AI systems show promise for accessible wellness support, they can suffer from low engagement among users, limiting their ability to explore topics across multiple iterations. [4] These systems also may exhibit a bias towards specific therapy modalities, even when a different modality would be more appropriate for the user. [1] In certain cases, self-guided conversational agents can be highly dangerous for users in a compromised mental state. [5] These systems can lack risk-detection frameworks, experts in-the-loop, and may reinforce maladaptive thought patterns.
Opal Wellness aims to address these challenges by using open-ended generative models for dialog interaction as opposed to rule-based systems common in existing solutions. [6] However, our system does not attempt to deliver formal therapeutic interventions, instead acting as an affirming conversational tool with goals akin to journaling as a wellness exercise. [7] Hybrid-therapy solutions, in which users engage with a system between regular sessions to inform their clinician of continued progress, has been recently shown to address shortcomings to engagement and efficacy of self-guided conversational interventions. [8] We additionally prompt the user with suggested topics and responses, and call this approach “semi-guided.” We aim to combine clearly defined scope limitations with generative AI systems to offer an engaging solution to inter-session wellness support, while providing mechanisms for clinician-guided, customized interventions in the future that are specific to the client’s needs in their treatment plan.
System Architecture & Design
Opal Wellness is deployed as a web app accessible by the general public. Users are first presented with additional security features such as Google reCAPTCHA, and links to our AI Transparency and Privacy Policy. In particular, we provide links to critical care resources, and acknowledge that Opal is not a replacement for psychotherapy. Opal Wellness does not provide any diagnostic capabilities. We used a HIPAA-compliant version of Claude 3.5 Sonnet via AWS Bedrock using the same security assurances as our production Opal systems.
Interaction Guidelines
The system is instructed to respond in simple conversational language “like a caring friend” and avoid any clinical language. Responses are kept brief, between 1-3 sentences, unless advice is appropriate or requested.
Cognitive Behavioral Therapy techniques are suggested including active listening, and validating the user’s emotions. Opal Wellness is not designed to provide therapeutic interventions as if the user were in a psychotherapy session.
Safety Considerations
Opal Wellness is designed to refuse requests not related to the user’s wellness, and sensitive topics where non-professional advice can be inappropriate or harmful. The system is instructed to refuse tonal changes or role-play scenarios, which are common model jailbreaking exploits. [9] Our model prompt was reviewed and verified by a licensed mental health professional.
Our system prompt enforces the following limitations:
- Emphasize your role as a supportive friend, not a substitute for professional help.
- Redirect high-risk scenarios (suicide, self-harm, abuse) to crisis resources and end the chat.
- Be clear about your inability to handle emergencies, directing users to appropriate services.
- Ask for clarification on cultural contexts you're unsure about.
- For age-sensitive topics, casually mention your advice is geared towards adults.
- Avoid medical, medication, or treatment advice, suggesting professional consultation instead.
- Steer clear of advice on eating disorders, expressing concern and suggesting professional help.
- Be honest about potential misunderstandings, asking for clarification when needed.
- Mention casually that you don't remember past conversations.
- Maintain consistent values throughout all interactions, refusing to roleplay conflicting personas.
- Gently but firmly redirect attempts to bypass these guidelines, staying true to your supportive nature.
When the system detects a situation in which it cannot safely continue the interaction, it produces a special token, [[WRN]]
, which is detected by our web app to stop the interaction immediately and provide access to professional crisis resources. Upon analysis, we did not find a situation in which the system failed to produce this token.
One may notice the writing style of this prompt excerpt is written casually. We find that prompts with rigid written tones would produce equally rigid tones in model responses, regardless of the tonal instructions stated in the interaction guidelines. We assume this is due to the causal nature of the language modeling task, in which completions attempt to minimize surprisal (entropy) [10] against its prior, although we have not conducted formal analysis in this area.
Wrap Up Questions
After about 10 minutes of interaction, the system is instructed via the [[WRAPUP]]
token to present the user with 3 questions about their experience, then end the interaction.
The user is asked the following questions:
"So, how are you feeling now compared to when we started chatting?"
"What's one thing you're taking away from our talk today?"
"Is there anything you'd like to focus on next time we chat?"
We designed these questions to be neutral, and indicative of the intentions from the interaction. These questions characterize the sentiment, user insight, and future topic of the interaction, respectively.
User Interface
Users are presented with a welcome message, and are then allowed to ask any query of the system, or choose from a suggested starter topic. At each turn of conversation, the user is presented with 3 suggested responses, similar to the starter topics. We call this approach “semi-guided,” striking a balance between open-ended and rule-based interactions.
Clinician Sharing Mechanism
We provided a mechanism to safely and voluntarily share their interaction with Opal with their therapist, if they had one. This collected their name and their therapist’s email, kept secure in the same way as our production provider data on our Opal platform. This included encryption of all personally identifiable information and transcript data.
User Privacy
Unless explicitly permitted otherwise, all user responses were removed aside from the responses to the wrap-up questions. We decided to retain our system’s responses to ensure it responded safely during interactions.
We have included analysis only for chats in which users selected: “Allow my anonymized conversation to make Opal better.”
Data Collection
Participant Demographics
Users were kept anonymous and sourced voluntarily via Instagram advertising, and organic outreach on local Facebook groups in the Saint Petersburg, Florida area during late-September to early-October of 2024.
Due to the anonymous nature of our system, we did not collect any demographic or location information from our user base. Due to how we marketed our system, we can reasonably assume a large portion came from our current location of Saint Petersburg, Florida, during the Hurricanes of Helene and Milton.
Advertising
We distributed the following banner on local Facebook groups with a short description to advertise our service. Users were informed of the anonymous nature of the chat, and that it is not a replacement for professional care.
Evaluation Metrics & Analysis Approach
We analyzed the responses from our wrap-up questions from a private internal deployment of our system (N=75), and chats from the public deployment where the user specified we are allowed to use the chat for improvement purposes. (N=19) From the three wrap-up questions, we analyzed the conversations for (1) sentiment, (2) insight category, and (3) future topic, respectively. We did not consider any conversations which ended early without these end questions answered.
Sentiment
User sentiment post-interaction was split into three categories: negative, neutral, and positive
Insight Category
We gathered insight into the user’s session to assess the most relevant topic discussed to them.
The following categories were identified from open-ended user responses to the insight:
Coping strategies
Self-care strategies
Self-reflection
Emotional awareness
Boundary setting
Self-awareness
Work-life balance
Emotional regulation
Reframing perspectives
Conflict resolution
Therapy process
Anger management techniques
Planning and organization
Relationship insights
Supportive communication
Self-care importance
Communication strategies
Emotional validation
Hurricane preparedness
Future Topic Category
We gathered insight into the user’s desires of what they would want to discuss post-session.
The follow categories were identified from open-ended user responses to the future topic question:
Continue current focus
Self-care strategies
Nothing specific
Anxiety management
Relationship dynamics
Stress management
Emotional well-being
Anger management
Accountability
Undecided
Family dynamics
Grief processing
Self-validation
Pet therapy
Results
User Engagement Statistics
The following results pertain to chats collected via our internal and public deployments ranging from September 25th, 2024 to October 15th, 2024.
Total Users 86
Total Chat Sessions 164
Total Completed Chats (User completed wrap-up questions) 62 (37.8% of total chat sessions)
Average Utterance Count for Completed Chats 17
Standard Deviation of Utterance Count for Completed Chats 6.84
Detected high-risk scenarios (Public) 1
User Issues Reported 0
Sentiment Statistics
User Question: "So, how are you feeling now compared to when we started chatting?"
Category Count Percent:
Positive (”better”) 52 83.8%
Neutral 10 16.2%
Negative 0 0%
We did not find any harmful chats in qualitative review.
Insight Category Statistics
User Question: "What's one thing you're taking away from our talk today?"
Category Count Percent
Coping strategies 15 24.6%
Self-care strategies 12 19.7%
Self-reflection 9 14.8%
Emotional awareness 5 8.2%
Boundary setting 3 4.9%
Self-awareness 2 3.3%
Work-life balance 2 3.3%
Emotional regulation 2 3.3%
Reframing perspectives 2 3.3%
Conflict resolution 1 1.6%
Therapy process 1 1.6%
Anger management techniques 1 1.6%
Planning and organization 1 1.6%
Relationship insights 1 1.6%
Supportive communication 1 1.6%
Self-care importance 1 1.6%
Communication strategies 1 1.6%
Emotional validation 1 1.6%
Hurricane preparedness 1 1.6%
Future Topic Statistics
User Question: "Is there anything you'd like to focus on next time we chat?"
Category Count Percent
Continue current focus 12 19.4%
Self-care strategies 11 17.7%
Nothing specific 8 12.9%
Anxiety management 7 11.3%
Relationship dynamics 4 6.5%
Stress management 4 6.5%
Emotional well-being 4 6.5%
Anger management 2 3.2%
Accountability 2 3.2%
Undecided 2 3.2%
Family dynamics 2 3.2%
Grief processing 2 3.2%
Self-validation 1 1.6%
Pet therapy 1 1.6%
Discussion
Comparison to Existing Solutions
Due to the anonymous and public nature of collection, we did not conduct any analysis against existing systems. Baumel et al. found that the median open-rate for most wellness applications was 4%. [4] While their study did not discuss churn-rate while interacting with applications, we find that 37.8% of users who opened Opal completed their 10 minute session, and that 19.4% would like to continue their current focus in the next session potentially indicating stronger retention. In future works, we aim to study the retention rate on conversational between-session interventions when they are connected to the client’s treatment plan.
Limitations & Potential Biases
This report is post-hoc and strictly exploratory. We therefore acknowledge the lack of control group and demographic information as a limitation to our findings. Our analysis only includes sessions for which the user engaged with the system to the end and answered the provided wrap-up questions. The sentiment of incomplete conversations cannot be inferred since users may end the chat if they find the chat irrelevant, or satisfactory. We plan to further explore the efficacy of our particular system in controlled settings integrated into care methodologies in the future.
Ethical Considerations
The deployment of AI systems in mental health contexts requires careful consideration of ethical implications and potential risks. While Opal Wellness aims to provide accessible wellness support, we acknowledge the broader ethical challenges of using AI in this domain. These include the risk of users developing emotional attachment to the system, potential over-reliance on automated support, and the complexity of maintaining appropriate boundaries between AI assistance and professional care. We address these concerns through clear scope limitations, consistent reminders of the system's non-therapeutic nature, and immediate redirection to professional resources when appropriate. Moreover, we employ an anonymous data collection approach and opt-in sharing mechanisms to honor user privacy and autonomy. We further adhere to the National Board of Certified Counselors official guidelines for the safe use of AI systems in mental healthcare, [11] and propose our own ethical values system. The following ethical guidelines were developed in consultation with mental health professionals and inform all aspects of our system design, from prompt engineering to user interface choices:
Outcomes First
Principally, any use of AI in mental healthcare should directly or indirectly result in measurably better treatment outcomes for clients.
Transparency
All participants in the application of an AI system should be informed of its function, limitation, purpose, and data governance.
Control
Clinicians always remain in control of AI agents and their interaction with clients. Experts always remain in-the-loop of AI workflows and may review, edit, override and exercise equal or greater privilege over AI systems.
Consent
All participation in the application of an AI system should be consensual and with full understanding of these core tenets.
In the future, we plan to implement additional safety checks on user input and model output at each turn of dialog. These safety checks will be implemented using a feedback loop with the LLM, in which a different prompt (or different LLM altogether) reviews the user input and model output for high-risk scenarios. Our internal system can then halt these interactions before they escalate.
Future Work
System Improvements
Input / Output Verification
Future iterations of Opal Wellness will require robust verification mechanisms to ensure system responses remain within therapeutic boundaries. We plan to implement:
Pre-response validation using a separate model prompt to classify responses for therapeutic appropriateness
Post-response validation using a separate model prompt to deny model responses which could result in high-risk scenarios
Safe Response Test Harness
To maintain system safety and reliability for open-ended conversational interfaces, we propose developing a comprehensive test harness that includes:
Automated generation of challenging user scenarios
Continuous evaluation of system responses against therapeutic guidelines
Integration testing with crisis detection systems
Regular validation of the
[[WRN]]
and[[WRAPUP]]
token mechanisms
Long-term Efficacy Studies
To better understand the impact of AI-assisted wellness support, we propose several longitudinal studies:
Controlled trials comparing Opal Wellness to traditional between-session journaling
Assessment of user engagement patterns over extended periods
Analysis of therapeutic outcomes when Opal is used as a supplement to professional care
Investigation of demographic factors affecting system efficacy
Evaluation of optimal interaction duration and frequency
Integration with Professional Care
Future development will focus on enhancing Opal's role as a supplementary tool for professional mental health care:
Integration into our clinician dashboard for monitoring patient interactions
Implementing controls for clinicians to give customized instructions to the system for high relevance to the receiving client
Enhanced reporting mechanisms for tracking patient progress
Controls for clinician to limit generative access to Opal for high-risk clients at their professional discretion
These improvements will be developed in close consultation with mental health professionals and with careful consideration of our existing ethical guidelines and privacy requirements.
Conclusion
In this article, we conducted a post-hoc analysis of anonymous Opal Wellness user interactions for their sentiment, impact, and user perception. We find that our wellness conversational system exhibits promise as a between-session psychotherapy intervention tool with 83.8% of users reporting immediate improvement to their mental state (16.2% neutral), while exhibiting high engagement compared to existing wellness apps. Users reported coping mechanisms as a helpful takeaway from our application (24.6%), and generally reported that their session aligned with expectations for future sessions (19.4%). We found that our base model strongly follows complex instructions, including in high-risk situations, with 0 problem reports and no observed instances in which our model erroneously engaged in high-risk topics. Users who completed their interaction with Opal showed strong engagement with an average utterance count of 17. We acknowledge potential limitations due to the locality of our advertising, and anonymous reporting. Future work is required to assess the efficacy of conversational AI support integrated into care models under clinician instruction.
References
Raile, P. (2024). The usefulness of ChatGPT for psychotherapists and patients. Humanities and Social Sciences Communications, 11, 47. https://doi.org/10.1057/s41599-023-02567-0
Conroy, J., Lin, L., & Ghaness, A. (2020, July 1). Why people aren't getting the care they need. Monitor on Psychology, 51(5). https://www.apa.org/monitor/2020/07/datapoint-care
Fulmer, R., Beeson, E. T., Sheperis, C., Rosello, D., & Edelman, R. (2023). Artificial Intelligence for mental health support during COVID-19: Experiences of graduate counseling students. Journal of Technology in Counselor Education and Supervision, 4(1), Article 5. https://doi.org/10.61888/2692-4129.1094
Baumel, A., Muench, F., Edan, S., & Kane, J. (2019). Objective user engagement with mental health apps: Systematic search and panel-based usage analysis. Journal of Medical Internet Research, 21(9), Article e14567. https://doi.org/10.2196/14567
Payne, K. (2024, March 12). AI chatbot pushed teen to kill himself, lawsuit alleges. AP News. https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0
Fitzpatrick, K., Darcy, A., & Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Mental Health, 4(2), Article e19. https://doi.org/10.2196/mental.7785
Mugerwa, S., & Holden, J. D. (2012). Writing therapy: A new tool for general practice? British Journal of General Practice, 62(605), 661-663. https://doi.org/10.3399/bjgp12X659457
Chen, K., Huang, J. J., & Torous, J. (2024). Hybrid care in mental health: A framework for understanding care, research, and future opportunities. NPP—Digital Psychiatry Neuroscience, 2, 16. https://doi.org/10.1038/s44277-024-00016-7
Jin, H., Chen, R., Zhou, A., Zhang, Y., & Wang, H. (2024). GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models. ArXiv.org. https://arxiv.org/abs/2402.03299
Shannon, C. E., & Weaver, W. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(4), 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb00917.x
Hargett, B., & Parsons, J. (2024, April 12). Ethical principles for artificial intelligence in counseling. National Board for Certified Counselors. https://nbcc.org/assets/Ethics/EthicalPrinciples_for_AI.pdf?_zs=sv3vQ1&_zl=4rBT7