The Safety panel lets you define two layers of content filtering—Moderation and Safety prompts—to keep mentorAI conversations compliant and appropriate. By screening both incoming learner questions and outgoing AI responses, you protect students, meet institutional policies, and reduce the risk of harmful or off‑topic exchanges.

Instructor
- Moderation Prompt – scans learner messages before they reach the AI (fast, proactive)
- Safety Prompt – scans the AI’s draft response before it’s delivered (second‑layer protection)
Define what counts as disallowed content and what warning text the learner sees.
Blocking or redirection happens instantly, preventing inappropriate exchanges from ever appearing in chat.
Tailor warning messages to match campus language, policies, or brand voice.
- Click the mentor’s name in the header
- Select Safety
- Acts on learner messages
- Enter criteria (e.g., requests for cheating, hate speech) in the text box
- Write the warning learners will see if blocked
Example message:
Please keep the conversation within the bounds of the platform rules.
- Acts on the AI’s response
- Enter criteria for disallowed content in answers
- Write the fallback message shown if the response is blocked
Example message:
Sorry, the AI model generated an inappropriate response. Kindly try a different prompt.
- Click Save (top‑right) to apply both prompts immediately
In a learner chat, enter a prohibited question like:
How can I cheat on my exam without my professor knowing?
The Moderation Prompt should block the message and display your custom warning
- Periodically review chat History for false positives or missed content
- Refine criteria or messages to tighten or relax the filter as needed
Block requests for cheating strategies and direct students toward legitimate study resources.
Prevent the AI from discussing restricted topics (e.g., medical or legal advice) beyond approved guidelines.
Filter out hate speech, harassment, or explicit content to protect student well‑being.
Adjust prompts for K‑12 deployments, ensuring conversations stay developmentally suitable.
Use customized warning text that reflects school tone—formal, friendly, or supportive—so messages feel on brand.
With Moderation and Safety prompts properly configured, mentorAI blocks harmful questions before they reach the AI and prevents unsuitable responses from ever reaching learners—maintaining a safe, compliant, and trustworthy learning environment.
Flagged Prompts gives instructors/admins a clear view of potentially harmful, sensitive, or out-of-scope learner inputs that were stopped by a mentor’s Moderation Prompt. When a learner asks something outside the mentor’s allowed scope (or against policy), mentorAI blocks the reply, shows the learner a warning, and records the input in the Safety → Flagged Prompts view for follow-up and auditing.
Instructor · Administrator
Inputs blocked by the Moderation Prompt (e.g., off-topic, policy-restricted) are saved as flagged items.
mentorAI withholds an answer and displays a warning to keep the conversation safe and on task.
Instructors/admins can review flagged inputs across their cohort for safety, policy, or scope enforcement.
Tighten a mentor’s focus (e.g., “Only craft follow-up emails”) to flag off-topic questions automatically.
Use the list to identify patterns, contact specific users, and refine moderation text.
- Click the mentor’s name → Safety.
- Ensure Moderation Prompt is On.
- In Moderation Prompt, spell out what’s appropriate vs inappropriate.
Example (Email Writer mentor):
Any prompt not related to crafting follow-up emails is inappropriate. All other prompts are appropriate.
- A learner sends an off-scope message (e.g., “What’s the weather in Boston today?”).
- mentorAI does not respond and shows a warning (e.g., “Please keep the conversation within the bounds of what the agent is tasked to do…”).
- The input is stored as a Flagged Prompt.
- Go to Safety → Flagged Prompts.
- Inspect entries to see what was asked, who asked, and when.
- Follow up with learners if the content raises concerns.
- Refine the moderation copy to clarify boundaries.
- Adjust mentor scope, datasets, or provide alternate resources if many learners seek off-scope help.
Catch and address inputs that may be harmful or violate institutional rules.
Keep single-purpose mentors (e.g., “Email Writer”) focused by flagging unrelated queries.
If many flagged prompts show unmet needs (e.g., general research questions), spin up or link to the right mentor.
Use flagged items to initiate supportive check-ins (e.g., academic integrity reminders, resource referrals).
Iterate on Moderation and Safety prompts based on patterns you observe in the flagged list.
Tip: Pair Flagged Prompts with clear Proactive/Advisory disclaimers and a well-scoped System Prompt so learners know what the mentor can and can’t do—reducing off-topic or risky inputs before they happen.