04/2026

Chat (RAG + LLM)

Kylänviisain | Healthcare

AI-assisted interface to Finnish public health information sources.

What is it about?

A conversational interface to Finnish public health information. Ask a health question — get an answer synthesised from three official sources with inline citations.

Try it live at: kylanviisain.saviranta.com

Kylänviisain

Sources:

  1. Terveyskylä.fi (HUS) — specialised healthcare information across 30+ “care houses”
  2. Terveyskirjasto.fi (Duodecim) — medical reference encyclopedia
  3. Käypä hoito (Duodecim) — clinical best-practices guidelines

The system operates as a “source narrator” — it quotes and synthesises what the sources say, never interprets for individual situations. This keeps the service outside EU MDR Software-as-Medical-Device classification.

Why I’m building it?

I had three learning criteria that were met selecting this project:

  • Something that doesn’t exist and is useful: Finnish public health info is scattered across multiple sites with different structures. I wanted a single conversational interface that makes it easy to find what the sources actually say.
  • You can’t call yourself an AI Builder, unless you’ve built a chatbot. It had to push me to implement a RAG memory system paired with LLM.
  • I wanted to explore the regulatory space of AI: Healthcare is the perfect domain.

What I would do differently:

  • Design the chat query pipeline in more detail and to be more agentic (now it is a single-thread workflow)

How am I building it?

  • RAG data pipeline: scrape three web-sources sources → chunk → embed → store as vector db.
  • Chat query: Embed the context → vector search → build response → stream the response.
  • Safety & Chat boundary checks: three rings run before any retrieval (regex pre-filters, PII and crisis blocks, LLM backboard check). Tests check disclaimer, source attribution etc presence.
  • Traces, evals, prompt edits: built in admin tool for annotating traces, evals for groundedness and relevance, prompt edits for testing and improving.

I still have to keep on improving how chat input context is interpreted so that RAG results are more accurate and feel less like a chatbot 2-3 years ago.

Kylänviisain admin dashboard

What’s my goal?

  1. Implement something in the Healthcare space
  2. Learn to build a production RAG system with regulatory awareness, layered safety, and observability.

Tech Corner

// STATUS ──────────────────────────────────────────────────────
chat live | admin dashboard built | eval pipeline running

lines_of_code: 15000   |   mvp_estimate: done

// PIPELINE ────────────────────────────────────────────────────
approach        scrape → chunk → embed → pgvector; query-time
                RAG with streaming response

deterministic   scraping, chunking, content hashing, dedup,
                rate limiting (Lua atomics), Tier 1 post-tests

llm             Gemini Flash for generation; Gemini Embedding 1 
                for embeddings; GPT-4o-mini for safety 
                classification, structured modes, evals

reasoning       Gemini Flash for cost and speed on generation;
                GPT-4o-mini where structured output and safety
                judgment matter more than fluency

// HUMAN IN THE LOOP ───────────────────────────────────────────
where           source narrator pattern — never prescribes or
                diagnoses; admin annotates traces for ground
                truth; prompt overrides via admin UI

why             health information has real consequences —
                system quotes sources, human decides what to do

// EVALUATION ──────────────────────────────────────────────────
how             Tier 1 tests post-stream (disclaimer, PII leak,
                URL validity, source attribution); LLM-as-Judge
                nightly batch (groundedness, relevance, citation
                accuracy on 0–100 scale); 7-day rolling averages

traces          full trace per request — boundary checks, Tier 1
                results, retrieved chunks, eval scores; admin
                dashboard for review and annotation

// CONTEXT ─────────────────────────────────────────────────────
approach        session-scoped; conversation history stays with
                the query within a session; embed → top-5 cosine
                search → lost-in-middle reordering → system
                prompt + context injection

why             follow-up questions need prior context to make
                sense; session boundary ensures no leakage
                between users; nothing persisted after session

// INTEGRATIONS ─────────────────────────────────────────────────
supabase        pgvector for RAG corpus + PostgreSQL for traces,
                eval, config; RLS policies per role

upstash_redis   three-layer rate limiting — per-IP sessions,
                concurrent sessions, daily token budget (Lua)

posthog         session replay + event tracking

gemini          text-embedding-001 (768d) for embeddings;
                Gemini Flash for generation

openai          GPT-4o-mini for safety classifiers + evals

// OTHER TOOLS ──────────────────────────────────────────────────
next_intl       vs manual i18n — request-based language detection
                for Finnish, English, Swedish

react_markdown  renders LLM output with inline source links;
                rehype-raw for HTML in markdown

zod             runtime validation on all API inputs and LLM
                response schemas