Synthesizing Rest: Safety, Privacy, and Programmatic Audio in a Sleep Wellness System

June 2025 · 9 min read Wellness Technology Audio Synthesis AI Safety Privacy

A sleep and dream wellness application that generates its entire audio environment programmatically, screens for crisis signals before any AI model sees the text, and stores nothing on any server. The architectural constraint that made all three possible: treating privacy, safety, and synthesis as structural requirements rather than features to be added later.

The Question

Sleep wellness apps occupy a peculiar position: they handle some of the most intimate data a person generates (sleep patterns, dream content, emotional states at the edge of consciousness) while operating under the same surveillance-capitalism incentives as every other consumer app. The Mozilla Foundation’s privacy audit of Calm, Headspace, and similar apps documented extensive data collection and third-party sharing [1]. A dream journal entry about a recurring nightmare becomes training data, then an advertising signal. The user’s vulnerability is the product.

Adding AI dream analysis compounds the problem. Large language models can interpret dream symbolism with reasonable coherence. But dream journals also contain crisis signals: descriptions of self-harm, expressions of hopelessness, references to suicidal ideation that surface in the liminal state between sleep and waking. The standard approach to crisis detection in AI systems is a two-stage pipeline: keyword matching followed by ML classification. This architecture has a dependency that most teams treat as acceptable: the safety layer depends on the same model infrastructure it is supposed to protect against. If the model hallucinates, drifts, or is prompt-injected, the safety layer fails with it.

A separate problem sits underneath: sleep apps rely on pre-recorded audio files for soundscapes and sleep aids. This means bandwidth dependencies, storage overhead, and zero adaptability. A brown noise track is a file; it cannot respond to the user’s breathing rate or adapt its spectral profile. The question Somnia.ai set out to answer: can a wellness system handle crisis-sensitive content, AI-powered analysis, and therapeutic audio while storing nothing on a server, depending on no external files, and keeping its safety layer immune to the AI it deploys?

The Approach

The architecture is organized around three constraints that were treated as non-negotiable from the first commit. Each constraint produced a specific technical approach.

Hardcoded crisis detection. The safety layer operates before any AI model processes the text. Twenty-eight crisis phrases are matched against the input using exact string matching with context negation: a phrase like “I want to kill myself” triggers, but “I want to kill myself laughing” does not, because “laughing” appears in the negation list. This approach is intentionally primitive by ML standards. It cannot detect indirect ideation, metaphorical distress, or novel expressions of crisis. What it can do is guarantee a floor: explicit crisis language will always be caught, regardless of model availability, prompt injection attacks, API failures, or inference drift. The layer is immune to every failure mode that affects the AI it precedes, because it shares no infrastructure with that AI. This is the central architectural insight: safety layers for AI systems must be architecturally independent of the AI they protect.

Programmatic audio synthesis. All eleven soundscapes in Somnia.ai are generated in real time through the Web Audio API. No audio files are downloaded, streamed, or stored. Brown noise is synthesized using a leaky integrator applied to white noise generated via Box-Muller transform for Gaussian-distributed samples. Pink noise uses the Paul Kellett algorithm [2], a computationally efficient approximation that maintains the correct 1/f spectral slope. The 40Hz gamma entrainment signal exploits the missing fundamental psychoacoustic effect: on phone speakers that cannot physically reproduce 40Hz, the system generates harmonics at 200Hz and 240Hz whose difference frequency the auditory cortex reconstructs as a perceived 40Hz tone. Coherence breathing entrainment modulates filter cutoff and gain at 0.1Hz (six breaths per minute), the frequency associated with heart rate variability optimization. An FM-synthesis progressive alarm generates a bird chorus that evolves from a single voice to a dense dawn chorus over sixty seconds, each voice synthesized from frequency modulation rather than samples. The result: 948 lines of DSP code that replace what would otherwise be hundreds of megabytes of audio assets.

The 40Hz gamma signal cannot exist as a file on a phone speaker. The speaker physically cannot reproduce 40Hz. So the system synthesizes harmonics at 200Hz and 240Hz, and the auditory cortex reconstructs the fundamental that was never actually present. The psychoacoustic effect works precisely because the sound is generated programmatically.

Privacy-first with RAG. Zero server-side storage means all dream journal entries, sleep data, mood logs, and analysis results live in the browser’s localStorage. But AI dream analysis requires context: a recurring symbol means nothing in isolation. The RAG pipeline generates 768-dimensional embeddings for each journal entry on-device, stores them locally, and retrieves relevant past entries via cosine similarity before constructing the analysis prompt. Individual dream analysis uses Gemini 2.5 Flash for speed; synthesis across multiple entries uses Gemini 2.5 Pro for depth. Failed API calls queue locally and retry on reconnect, maintaining offline functionality. The user’s dream journal never leaves the device except as an ephemeral API call; the API provider receives only the current entry plus retrieved context, never the full journal.

What Emerged

The crisis detection insight generalized. The hardcoded safety layer was designed for a narrow purpose: catch explicit crisis language before it reaches Gemini. But the architectural principle it embodies—safety mechanisms must be independent of the systems they monitor—has broader implications. In the current field of AI safety, most guardrails are implemented as additional prompting, fine-tuning constraints, or ML classifiers that share the same inference pipeline as the primary model. When the model fails, the guardrails fail with it. Somnia’s approach is crude but structurally sound: the safety floor has zero shared failure modes with the AI ceiling.

The 40Hz gamma evidence strengthened. The initial implementation of gamma entrainment was partly speculative. The MIT Tsai Lab had published promising results, but the evidence base was thin. Between the initial design and this writing, multiple randomized controlled trials have been published. Martorell et al. demonstrated that 40Hz audiovisual stimulation triggers measurable glymphatic clearance via VIP peptide release [3]. Chan et al. showed cognitive benefits in human participants with mild cognitive impairment after sustained 40Hz stimulation [4]. The intervention is transitioning from speculative to evidence-based, and the missing fundamental approach allows it to work on commodity hardware without specialized transducers.

Dream text breaks standard RAG. Dreams are not documents. They are fragmentary, non-linear, emotionally loaded, and often internally contradictory. Standard RAG retrieval optimized for factual documents performs poorly on dream text because semantic similarity in dreams operates differently: a dream about flying and a dream about drowning may be more thematically related (both encode anxiety about control) than a dream about flying and a dream about airports (which share surface vocabulary but not psychological content). The current cosine similarity approach retrieves on surface semantics, and the 103 insight cards across seven analytical tabs partially compensate by offering structured analytical frames. But the deeper question remains open: how to build retrieval systems for emotionally structured rather than factually structured text.

What I Learned

The privacy cost is real. Zero server-side storage is a genuine architectural commitment, not a marketing claim. It means no cross-device sync by default, no backup recovery if the user clears browser storage, no aggregate analysis across users for research purposes. Every wellness app that collects server-side data can offer features that Somnia structurally cannot. The question is whether the privacy guarantee is worth the capability loss. For dream journals specifically, the answer may be yes: contextual integrity theory argues that privacy violations are determined not by the information itself but by whether its flow matches the norms of the context in which it was generated [5]. Dream content generated in the intimate context of sleep has strong contextual norms against corporate surveillance. An ephemeral AI interaction that processes the content transiently may be the only appropriate information flow.

A safety floor, not a ceiling. The hardcoded crisis detection layer catches explicit language and misses everything else. Indirect ideation, metaphorical distress, and culturally specific expressions of crisis are invisible to twenty-eight pattern-matched phrases. The layer is not a crisis detection system in any clinical sense. It is a safety floor: a guarantee that the most explicit signals will never be missed, regardless of what happens to the AI above it. The gap between this floor and clinically adequate crisis detection is the space where ML classifiers, clinical consultation, and human oversight operate. The argument is not that hardcoded detection is sufficient, but that it should be the foundation that other layers build upon.

The hardcoded layer is not a crisis detection system. It is a guarantee. Twenty-eight phrases, context negation, zero dependencies. Everything above it can fail, and this floor still holds. That structural independence is the point.

Multiple research angles. Somnia.ai is designed as a research instrument as much as a product. The RAG-enhanced dream analysis pipeline and the privacy-preserving wellness architecture each examine a different tension in the system: between AI capability and retrieval quality, and between privacy constraints and feature capability. Whether these tensions can be resolved or only managed is the open question that connects the work.

References

  1. Mozilla Foundation. (2024). Calm vs. Headspace: Which is best for your privacy & security? Privacy Not Included.
  2. Kellett, P. (1999). A filter to generate pink (1/f) noise. First-order IIR filter approximation.
  3. Martorell, A. J., Paulson, A. L., Suk, H.-J., Abdurrob, F., Drummond, G. T., Guan, W., … & Tsai, L.-H. (2019). Multi-sensory gamma stimulation ameliorates Alzheimer’s-associated pathology and improves cognition. Cell, 177(2), 256–271.
  4. Chan, D., Suk, H.-J., Jackson, B. L., Milman, N. P., Stark, D., Klerman, E. B., … & Bhatt, S. (2022). Gamma frequency sensory stimulation in mild probable Alzheimer’s dementia patients. PLoS ONE, 17(12), e0278412.
  5. Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79(1), 119–157.
  6. Lewis, P. A., & Durrant, S. J. (2011). Overlapping memory replay during sleep builds cognitive schemata. Trends in Cognitive Sciences, 15(8), 343–351.
  7. Smith, J. O. (2007). Physical Audio Signal Processing. W3K Publishing.