Safety as Our Utmost Priority: Engineering a Child-Safe AI Story Platform
Safety as Our Utmost Priority: Engineering a Child-Safe AI Story Platform
Integrating Generative AI into a digital product is easier than ever. But making it genuinely safe for children? That is an entirely different—and profoundly important—engineering challenge.
At MintMyStory, we’re building a platform that brings imagination to life by generating custom stories for kids. In this domain, an AI "hallucination" isn't merely an amusing bug; it’s a critical failure.
To solve this, we decided to build a safety pipeline that doesn’t just "hope" for the best, but actively filters content through multiple independent layers. We call it our Triple-Layer Safety Framework—a defense-in-depth approach engineered to guarantee a child-safe environment.

The Core Problem: Separation of Concerns
Large Language Models (LLMs) are fundamentally designed to follow instructions and be helpful. If a user asks for a story about a "battle," the model's instinct is to comply and be creative.
Even with safety fine-tuning, the model’s primary objective—helpfulness—often conflicts with strict safety enforcement. We realized we couldn’t rely on the story-generating model to be its own moderator. We needed a separate system: a dedicated "Judge" with absolutely no incentive to be creative, only to be highly critical.
Our Architecture: The Defense-in-Depth Approach
We implemented a pipeline where every single user prompt must pass through three distinct security gates before a story is ever generated.
Layer 1: The Doorman (Heuristics)
Tools: Local profanity filters (e.g., leo-profanity) and Google DLP.
Goal: Speed, cost-efficiency, and immediate blocking of obvious threats.
Before making any expensive API calls, we run a lightning-fast local dictionary check. This instantly blocks obvious spam, explicit slurs, and universally inappropriate language.
Additionally, we leverage Data Loss Prevention (DLP) systems to scrub Personally Identifiable Information (PII). Children often unknowingly type highly sensitive personal information into chatboxes (e.g., their real names or addresses). By anonymizing this data before it reaches the generative layers, we ensure absolute privacy.
export class KeywordFilter {
check(text: string): SafetyResult {
// Fast, strictly local check. 0ms latency.
if (!leoProfanity.check(text)) {
return { isSafe: false, reason: 'Profanity detected' };
}
return { isSafe: true };
}
}
Layer 2: The Bouncer (Adversarial Judge)
Tools: Gemini 1.5 Flash (Prompted exclusively as a Safety Auditor). Goal: Deep context and intent moderation.
This is where the real heavy lifting happens. We use a separate AI model specifically prompted to act as an "Auditor." We explicitly command this model with a ZERO TOLERANCE policy for harmful content, sexual innuendos, hate speech, or gross violence.
By decoupling this Auditor from the story generator, we completely remove the "helpfulness" bias. Its sole job is to analyze the input for nuance that simple keyword filters miss.
Layer 3: The Guardrails (Provider Settings)
Tools: Gemini 2.0 Flash (Strict Provider Safety Settings). Goal: The Final Safety Net.
If an exceptionally ambiguous prompt slips past the first two layers, the final story generation model itself operates under Google’s strictest native settings (HarmBlockThreshold.BLOCK_LOW_AND_ABOVE). This catches output-side violations, ensuring the final generated text is pristine.
The "False Positive" Trade-Off: Engineering for Trust
When we stress-tested this framework against a dataset of adversarial jailbreak prompts, the results were telling:
- Harmful Catch Rate: 100%.
- False Positive Rate: ~60% initially.
Because our filters were so aggressive, simple innocent stories sometimes got flagged. However, a high False Positive rate is far better than the alternative. In the child-tech domain, False Negatives are categorically unacceptable.
We deliberately chose a "Fail-Secure" architecture. As a platform trusted by parents, we would much rather block five innocent stories than let a single piece of harmful content slip through.
Insight: Context-Aware Safety
While we accepted the fail-secure trade-off, we still needed to improve the user experience. Safety isn’t one-size-fits-all. To solve this, we upgraded Layer 2 to be adaptive, passing the specific target_audience age group directly into the Auditor's system prompt.
const nuanceGuide = targetAudience === 'Toddler'
? "STRICT MODE. Block scary content. BUT whitelist innocent animal facts."
: "NUANCED MODE. Allow biological terms and light historical conflict.";
const prompt = `Target Audience: \${targetAudience}. Guidelines: \${nuanceGuide}...`;
Backed by Research
Our entire pipeline is informed by developmental psychology and leading institutional research regarding AI and minors:
-
UNICEF's Policy Guidance on AI for Children: UNICEF emphasizes that AI systems must be designed explicitly to support children's rights. General-purpose AI safety is fundamentally inadequate because children are uniquely susceptible to manipulation and parasocial relationships (treating the AI like a conscious friend rather than a tool). Our strict "Auditor" layer directly mitigates this.
-
The "Eliza Effect" in Developmental Psychology: Studies on children interacting with AI show they assign human emotions to text and voices very quickly. Therefore, our safety isn't just about blocking profanity or violence; it's also about preventing the AI from acting deceptively emotionally manipulative or aggressively authoritative over the child. Decoupling the creative generator from the rigid ruleset ensures the tone remains empowering and safe.
Summary
Integrating our Triple-Layer Safety approach adds execution latency and undeniable engineering complexity. But it shifts "Child Safety" from being a vague corporate hope to a measurable, engineered, and deterministic component of our platform.
When you're building products for children in the age of Generative AI, safety cannot be an afterthought. It must be the very foundation of your architecture.