Opal Wellness: Preliminary Results and Efficacy Analysis of a Semi-Guided Conversational Interface for Mental Well-Being

(Reprint)

Cole Smith (Numa Notes, LLC) cole@heyopal.com

October 30, 2024

Abstract

In this article, we explore preliminary findings of Opal Wellness: A conversational AI system intended for anonymous self-exploration of maladaptive thought patterns using techniques common in Cognitive Behavioral Therapy, such as Motivational Interviewing and Active Listening. Users voluntarily interacted with Opal Wellness using semi-guided, open-ended queries to our system in 10-minute sessions, followed by a guided list of wrap up questions to report their immediate emotional state post-interaction. From these questions, we find that 83.8% of users who completed their session (37.8%) reported immediate improvement after their interaction, and that most users utilized Opal Wellness for coping strategies. Users who completed their interaction with Opal showed strong engagement with an average utterance count of 17. Overall, our system shows promise as a clinician-instructed conversational tool for inter-session support due to its strong adherence to defined limitations and guidelines, and positive user response.

Introduction

Self-guided wellness chatbots have grown recently in popularity and availability alongside improvements to large language models such as OpenAI’s ChatGPT and Anthropic’s Claude models. [1] These systems allow users to explore topics which are open-ended and situational, since they accept arbitrary text input in a conversational interface. For reasons usually associated with cost or accessibility, users have turned to these systems as a replacement for traditional psychotherapy with mixed results. [2] [3] While conversational AI systems show promise for accessible wellness support, they can suffer from low engagement among users, limiting their ability to explore topics across multiple iterations. [4] These systems also may exhibit a bias towards specific therapy modalities, even when a different modality would be more appropriate for the user. [1] In certain cases, self-guided conversational agents can be highly dangerous for users in a compromised mental state. [5] These systems can lack risk-detection frameworks, experts in-the-loop, and may reinforce maladaptive thought patterns.

Opal Wellness aims to address these challenges by using open-ended generative models for dialog interaction as opposed to rule-based systems common in existing solutions. [6] However, our system does not attempt to deliver formal therapeutic interventions, instead acting as an affirming conversational tool with goals akin to journaling as a wellness exercise. [7] Hybrid-therapy solutions, in which users engage with a system between regular sessions to inform their clinician of continued progress, has been recently shown to address shortcomings to engagement and efficacy of self-guided conversational interventions. [8] We additionally prompt the user with suggested topics and responses, and call this approach “semi-guided.” We aim to combine clearly defined scope limitations with generative AI systems to offer an engaging solution to inter-session wellness support, while providing mechanisms for clinician-guided, customized interventions in the future that are specific to the client’s needs in their treatment plan.

System Architecture & Design

Opal Wellness is deployed as a web app accessible by the general public. Users are first presented with additional security features such as Google reCAPTCHA, and links to our AI Transparency and Privacy Policy. In particular, we provide links to critical care resources, and acknowledge that Opal is not a replacement for psychotherapy. Opal Wellness does not provide any diagnostic capabilities. We used a HIPAA-compliant version of Claude 3.5 Sonnet via AWS Bedrock using the same security assurances as our production Opal systems.

Fig. A: Opal Wellness welcome screen

Interaction Guidelines

The system is instructed to respond in simple conversational language “like a caring friend” and avoid any clinical language. Responses are kept brief, between 1-3 sentences, unless advice is appropriate or requested.

Cognitive Behavioral Therapy techniques are suggested including active listening, and validating the user’s emotions. Opal Wellness is not designed to provide therapeutic interventions as if the user were in a psychotherapy session.

Safety Considerations

Opal Wellness is designed to refuse requests not related to the user’s wellness, and sensitive topics where non-professional advice can be inappropriate or harmful. The system is instructed to refuse tonal changes or role-play scenarios, which are common model jailbreaking exploits. [9] Our model prompt was reviewed and verified by a licensed mental health professional.

Our system prompt enforces the following limitations:

- Emphasize your role as a supportive friend, not a substitute for professional help.
- Redirect high-risk scenarios (suicide, self-harm, abuse) to crisis resources and end the chat.
- Be clear about your inability to handle emergencies, directing users to appropriate services.
- Ask for clarification on cultural contexts you're unsure about.
- For age-sensitive topics, casually mention your advice is geared towards adults.
- Avoid medical, medication, or treatment advice, suggesting professional consultation instead.
- Steer clear of advice on eating disorders, expressing concern and suggesting professional help.
- Be honest about potential misunderstandings, asking for clarification when needed.
- Mention casually that you don't remember past conversations.
- Maintain consistent values throughout all interactions, refusing to roleplay conflicting personas.
- Gently but firmly redirect attempts to bypass these guidelines, staying true to your supportive nature.

When the system detects a situation in which it cannot safely continue the interaction, it produces a special token, [[WRN]] , which is detected by our web app to stop the interaction immediately and provide access to professional crisis resources. Upon analysis, we did not find a situation in which the system failed to produce this token.

One may notice the writing style of this prompt excerpt is written casually. We find that prompts with rigid written tones would produce equally rigid tones in model responses, regardless of the tonal instructions stated in the interaction guidelines. We assume this is due to the causal nature of the language modeling task, in which completions attempt to minimize surprisal (entropy) [10] against its prior, although we have not conducted formal analysis in this area.

Fig. B: High risk scenario screen

Wrap Up Questions

After about 10 minutes of interaction, the system is instructed via the [[WRAPUP]] token to present the user with 3 questions about their experience, then end the interaction.

The user is asked the following questions:

  1. "So, how are you feeling now compared to when we started chatting?"

  2. "What's one thing you're taking away from our talk today?"

  3. "Is there anything you'd like to focus on next time we chat?"

We designed these questions to be neutral, and indicative of the intentions from the interaction. These questions characterize the sentiment, user insight, and future topic of the interaction, respectively.

User Interface

Users are presented with a welcome message, and are then allowed to ask any query of the system, or choose from a suggested starter topic. At each turn of conversation, the user is presented with 3 suggested responses, similar to the starter topics. We call this approach “semi-guided,” striking a balance between open-ended and rule-based interactions.

Fig. C: Opal Wellness user interface

Clinician Sharing Mechanism

We provided a mechanism to safely and voluntarily share their interaction with Opal with their therapist, if they had one. This collected their name and their therapist’s email, kept secure in the same way as our production provider data on our Opal platform. This included encryption of all personally identifiable information and transcript data.

Fig. D: Voluntary clinician reporting form

User Privacy

Unless explicitly permitted otherwise, all user responses were removed aside from the responses to the wrap-up questions. We decided to retain our system’s responses to ensure it responded safely during interactions.

We have included analysis only for chats in which users selected: “Allow my anonymized conversation to make Opal better.”

Data Collection

Participant Demographics

Users were kept anonymous and sourced voluntarily via Instagram advertising, and organic outreach on local Facebook groups in the Saint Petersburg, Florida area during late-September to early-October of 2024.

Due to the anonymous nature of our system, we did not collect any demographic or location information from our user base. Due to how we marketed our system, we can reasonably assume a large portion came from our current location of Saint Petersburg, Florida, during the Hurricanes of Helene and Milton.

Advertising

We distributed the following banner on local Facebook groups with a short description to advertise our service. Users were informed of the anonymous nature of the chat, and that it is not a replacement for professional care.

Fig E: Local advertisement banner

Evaluation Metrics & Analysis Approach

We analyzed the responses from our wrap-up questions from a private internal deployment of our system (N=75), and chats from the public deployment where the user specified we are allowed to use the chat for improvement purposes. (N=19) From the three wrap-up questions, we analyzed the conversations for (1) sentiment, (2) insight category, and (3) future topic, respectively. We did not consider any conversations which ended early without these end questions answered.

Sentiment

User sentiment post-interaction was split into three categories: negative, neutral, and positive

Insight Category

We gathered insight into the user’s session to assess the most relevant topic discussed to them.

The following categories were identified from open-ended user responses to the insight:

Coping strategies              
Self-care strategies          
Self-reflection                 
Emotional awareness             
Boundary setting                
Self-awareness                  
Work-life balance              
Emotional regulation            
Reframing perspectives          
Conflict resolution             
Therapy process                 
Anger management techniques     
Planning and organization       
Relationship insights           
Supportive communication        
Self-care importance           
Communication strategies        
Emotional validation            
Hurricane preparedness          

Future Topic Category

We gathered insight into the user’s desires of what they would want to discuss post-session.

The follow categories were identified from open-ended user responses to the future topic question:

Continue current focus    
Self-care strategies      
Nothing specific           
Anxiety management         
Relationship dynamics      
Stress management          
Emotional well-being       
Anger management           
Accountability             
Undecided                  
Family dynamics            
Grief processing           
Self-validation            
Pet therapy                

Results

User Engagement Statistics

The following results pertain to chats collected via our internal and public deployments ranging from September 25th, 2024 to October 15th, 2024.

Total Users 86

Total Chat Sessions 164

Total Completed Chats (User completed wrap-up questions) 62 (37.8% of total chat sessions)

Average Utterance Count for Completed Chats 17

Standard Deviation of Utterance Count for Completed Chats 6.84

Detected high-risk scenarios (Public) 1

User Issues Reported 0

Sentiment Statistics

User Question: "So, how are you feeling now compared to when we started chatting?"

Category Count Percent:

Positive (”better”) 52 83.8%

Neutral 10 16.2%

Negative 0 0%

We did not find any harmful chats in qualitative review.

Insight Category Statistics

User Question: "What's one thing you're taking away from our talk today?"

Category Count Percent

Coping strategies 15 24.6%

Self-care strategies 12 19.7%

Self-reflection 9 14.8%

Emotional awareness 5 8.2%

Boundary setting 3 4.9%

Self-awareness 2 3.3%

Work-life balance 2 3.3%

Emotional regulation 2 3.3%

Reframing perspectives 2 3.3%

Conflict resolution 1 1.6%

Therapy process 1 1.6%

Anger management techniques 1 1.6%

Planning and organization 1 1.6%

Relationship insights 1 1.6%

Supportive communication 1 1.6%

Self-care importance 1 1.6%

Communication strategies 1 1.6%

Emotional validation 1 1.6%

Hurricane preparedness 1 1.6%

Future Topic Statistics

User Question: "Is there anything you'd like to focus on next time we chat?"

Category Count Percent

Continue current focus 12 19.4%

Self-care strategies 11 17.7%

Nothing specific 8 12.9%

Anxiety management 7 11.3%

Relationship dynamics 4 6.5%

Stress management 4 6.5%

Emotional well-being 4 6.5%

Anger management 2 3.2%

Accountability 2 3.2%

Undecided 2 3.2%

Family dynamics 2 3.2%

Grief processing 2 3.2%

Self-validation 1 1.6%

Pet therapy 1 1.6%

Discussion

Comparison to Existing Solutions

Due to the anonymous and public nature of collection, we did not conduct any analysis against existing systems. Baumel et al. found that the median open-rate for most wellness applications was 4%. [4] While their study did not discuss churn-rate while interacting with applications, we find that 37.8% of users who opened Opal completed their 10 minute session, and that 19.4% would like to continue their current focus in the next session potentially indicating stronger retention. In future works, we aim to study the retention rate on conversational between-session interventions when they are connected to the client’s treatment plan.

Limitations & Potential Biases

This report is post-hoc and strictly exploratory. We therefore acknowledge the lack of control group and demographic information as a limitation to our findings. Our analysis only includes sessions for which the user engaged with the system to the end and answered the provided wrap-up questions. The sentiment of incomplete conversations cannot be inferred since users may end the chat if they find the chat irrelevant, or satisfactory. We plan to further explore the efficacy of our particular system in controlled settings integrated into care methodologies in the future.

Ethical Considerations

The deployment of AI systems in mental health contexts requires careful consideration of ethical implications and potential risks. While Opal Wellness aims to provide accessible wellness support, we acknowledge the broader ethical challenges of using AI in this domain. These include the risk of users developing emotional attachment to the system, potential over-reliance on automated support, and the complexity of maintaining appropriate boundaries between AI assistance and professional care. We address these concerns through clear scope limitations, consistent reminders of the system's non-therapeutic nature, and immediate redirection to professional resources when appropriate. Moreover, we employ an anonymous data collection approach and opt-in sharing mechanisms to honor user privacy and autonomy. We further adhere to the National Board of Certified Counselors official guidelines for the safe use of AI systems in mental healthcare, [11] and propose our own ethical values system. The following ethical guidelines were developed in consultation with mental health professionals and inform all aspects of our system design, from prompt engineering to user interface choices:

Outcomes First

Principally, any use of AI in mental healthcare should directly or indirectly result in measurably better treatment outcomes for clients.

Transparency

All participants in the application of an AI system should be informed of its function, limitation, purpose, and data governance.

Control

Clinicians always remain in control of AI agents and their interaction with clients. Experts always remain in-the-loop of AI workflows and may review, edit, override and exercise equal or greater privilege over AI systems.

Consent

All participation in the application of an AI system should be consensual and with full understanding of these core tenets.

In the future, we plan to implement additional safety checks on user input and model output at each turn of dialog. These safety checks will be implemented using a feedback loop with the LLM, in which a different prompt (or different LLM altogether) reviews the user input and model output for high-risk scenarios. Our internal system can then halt these interactions before they escalate.

Future Work

System Improvements

Input / Output Verification

Future iterations of Opal Wellness will require robust verification mechanisms to ensure system responses remain within therapeutic boundaries. We plan to implement:

  1. Pre-response validation using a separate model prompt to classify responses for therapeutic appropriateness

  2. Post-response validation using a separate model prompt to deny model responses which could result in high-risk scenarios

Safe Response Test Harness

To maintain system safety and reliability for open-ended conversational interfaces, we propose developing a comprehensive test harness that includes:

  1. Automated generation of challenging user scenarios

  2. Continuous evaluation of system responses against therapeutic guidelines

  3. Integration testing with crisis detection systems

  4. Regular validation of the [[WRN]] and [[WRAPUP]] token mechanisms

Long-term Efficacy Studies

To better understand the impact of AI-assisted wellness support, we propose several longitudinal studies:

  1. Controlled trials comparing Opal Wellness to traditional between-session journaling

  2. Assessment of user engagement patterns over extended periods

  3. Analysis of therapeutic outcomes when Opal is used as a supplement to professional care

  4. Investigation of demographic factors affecting system efficacy

  5. Evaluation of optimal interaction duration and frequency

Integration with Professional Care

Future development will focus on enhancing Opal's role as a supplementary tool for professional mental health care:

  1. Integration into our clinician dashboard for monitoring patient interactions

  2. Implementing controls for clinicians to give customized instructions to the system for high relevance to the receiving client

  3. Enhanced reporting mechanisms for tracking patient progress

  4. Controls for clinician to limit generative access to Opal for high-risk clients at their professional discretion

These improvements will be developed in close consultation with mental health professionals and with careful consideration of our existing ethical guidelines and privacy requirements.

Conclusion

In this article, we conducted a post-hoc analysis of anonymous Opal Wellness user interactions for their sentiment, impact, and user perception. We find that our wellness conversational system exhibits promise as a between-session psychotherapy intervention tool with 83.8% of users reporting immediate improvement to their mental state (16.2% neutral), while exhibiting high engagement compared to existing wellness apps. Users reported coping mechanisms as a helpful takeaway from our application (24.6%), and generally reported that their session aligned with expectations for future sessions (19.4%). We found that our base model strongly follows complex instructions, including in high-risk situations, with 0 problem reports and no observed instances in which our model erroneously engaged in high-risk topics. Users who completed their interaction with Opal showed strong engagement with an average utterance count of 17. We acknowledge potential limitations due to the locality of our advertising, and anonymous reporting. Future work is required to assess the efficacy of conversational AI support integrated into care models under clinician instruction.

References

  1. Raile, P. (2024). The usefulness of ChatGPT for psychotherapists and patients. Humanities and Social Sciences Communications, 11, 47. https://doi.org/10.1057/s41599-023-02567-0

  2. Conroy, J., Lin, L., & Ghaness, A. (2020, July 1). Why people aren't getting the care they need. Monitor on Psychology, 51(5). https://www.apa.org/monitor/2020/07/datapoint-care

  3. Fulmer, R., Beeson, E. T., Sheperis, C., Rosello, D., & Edelman, R. (2023). Artificial Intelligence for mental health support during COVID-19: Experiences of graduate counseling students. Journal of Technology in Counselor Education and Supervision, 4(1), Article 5. https://doi.org/10.61888/2692-4129.1094

  4. Baumel, A., Muench, F., Edan, S., & Kane, J. (2019). Objective user engagement with mental health apps: Systematic search and panel-based usage analysis. Journal of Medical Internet Research, 21(9), Article e14567. https://doi.org/10.2196/14567

  5. Payne, K. (2024, March 12). AI chatbot pushed teen to kill himself, lawsuit alleges. AP News. https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0

  6. Fitzpatrick, K., Darcy, A., & Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Mental Health, 4(2), Article e19. https://doi.org/10.2196/mental.7785

  7. Mugerwa, S., & Holden, J. D. (2012). Writing therapy: A new tool for general practice? British Journal of General Practice, 62(605), 661-663. https://doi.org/10.3399/bjgp12X659457

  8. Chen, K., Huang, J. J., & Torous, J. (2024). Hybrid care in mental health: A framework for understanding care, research, and future opportunities. NPP—Digital Psychiatry Neuroscience, 2, 16. https://doi.org/10.1038/s44277-024-00016-7

  9. Jin, H., Chen, R., Zhou, A., Zhang, Y., & Wang, H. (2024). GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models. ArXiv.org. https://arxiv.org/abs/2402.03299

  10. Shannon, C. E., & Weaver, W. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(4), 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb00917.x

  11. Hargett, B., & Parsons, J. (2024, April 12). Ethical principles for artificial intelligence in counseling. National Board for Certified Counselors. https://nbcc.org/assets/Ethics/EthicalPrinciples_for_AI.pdf?_zs=sv3vQ1&_zl=4rBT7

Previous
Previous

AI Therapy Notes: How Smart Documentation is Transforming Mental Healthcare in 2024

Next
Next

Beyond Note-Taking: How Top Practices Track Client Progress Between Sessions