tutorialcase-studyhow-to

Recreating Rebecca Yu's Dining Micro-App: A Developer Walkthrough Using Claude and ChatGPT

UUnknown

2026-02-13

10 min read

A developer walkthrough to reproduce Rebecca Yu's dining micro-app using Claude and ChatGPT—architecture, prompts, code, and 2026 deployment tips.

Hook: Stop wasting cycles debating dinner — build a reproducible dining Micro-App

Teams and devs: decision fatigue and repetitive chat debates slow down productivity. If you’ve wanted to automate group decisions, prototype an internal tool, or ship a demo that blends opinionated UI with LLM power, this walkthrough reproduces Rebecca Yu’s Where2Eat concept using Claude and ChatGPT. You’ll get architecture, exact prompts, code patterns, and deployment tips targeted for 2026 production habits: multi-model orchestration, RAG, vector memory, and cost-aware function calling.

What you’ll get (quick)

A compact architecture diagram and components list suitable for a micro-app
Proven prompts for both Claude and ChatGPT including response schemas
Code snippets for Node.js server integration (function calling & RAG)
Performance, cost, and deployment best practices for 2026

The evolution: why this matters in 2026

Since late 2024, the toolchain for micro-apps matured: model specialization, standardized function calling, and robust vector stores made small, opinionated apps easier to build and cheaper to run. By late 2025 and into 2026 we saw three trends that change the recipe:

Multi-model orchestration: developers routinely route tasks to the best model—for example, Claude for nuanced group summarization, ChatGPT for deterministic, function-driven operations.
RAG + lightweight memory: vector DBs and local embeddings let micro-apps provide personalized recommendations without exposing PII to models.
Function calling as first-class behavior: models return structured actions (JSON), enabling secure API interactions (maps, reservations) without brittle prompt parsing.

High-level architecture

Components

Frontend: React (Vite) hosted on Vercel, minimal state, streaming UI for responses.
API / Orchestrator: Node.js (Fastify) or serverless functions to handle requests, model selection, and function invocation.
Vector DB: Pinecone / Milvus / Weaviate for preference memory and restaurant embeddings.
Knowledge sources: static restaurant metadata, Google Places API (or other provider) for search and details.
LLM providers: OpenAI (ChatGPT with function calling) + Anthropic (Claude) for summarization/creativity) or vendor equivalents.
Monitoring: basic logging, prompt telemetry, cost meter.

Request flow

User creates a dining session or shares a link in a group chat.
Users vote or add preferences; the client stores lightweight session state and writes preference vectors to the DB.
On request, the API composes a RAG context: user preferences + local restaurant metadata and emits a prompt to models.
Model returns a structured recommendation set. If action is required (reserve, open map), function calling is used to hit external APIs.
Results are rendered; accepted suggestions are logged for metrics and memory updates.

Design decisions and why: Claude + ChatGPT

Use both models to exploit strengths: Claude excels at conversational summarization and safe creative phrasing; ChatGPT’s function calling and deterministic outputs are ideal for API-driving responses. In production micro-apps in 2026, this multi-model pattern reduces hallucinations while enabling structured interactions.

Prompts and schemas — concrete examples

Below are prompts tailored to produce machine-readable JSON outputs for a dining recommendation step. Keep intent explicit, define schema, and set instructions for fallback behavior.

Prompt: preference-aware recommendation (ChatGPT function style)

Goal: Return up to 5 ranked restaurants in JSON with reasons and a numeric score. Use function calling to return structured data.

// System message
You are a concise restaurant recommendation engine. Given user preferences and restaurant metadata, return a JSON array named "recommendations".

// User message
Context: [INSERT RAG CONTEXT: user prefs, recent choices, local restaurants metadata]
Question: Recommend up to 5 restaurants, ranked by suitability. For each, include: id, name, cuisines[], short_reason (1 sentence), score (0-100), and actions: {openMap: url, reserve: boolean}.
If insufficient data, return an empty array.

Example function response schema (for OpenAI function-calling):

{
  "recommendations": [
    {
      "id": "rest_123",
      "name": "Taco Factory",
      "cuisines": ["Mexican", "Casual"],
      "short_reason": "Matches two members who want casual tacos and is within 10 minutes.",
      "score": 87,
      "actions": {"openMap": "https://maps.example/?q=Taco+Factory", "reserve": false}
    }
  ]
}

Claude prompt for group summarization

Use Claude to summarize open-text opinions in chat, producing condensed preferences for the RAG step.

System: You are a summarization assistant. Extract up to 6 preference tags from the chat (e.g., "spicy", "cheap", "near subway") and a short weighted list indicating intensity (1-5).
User: [INSERT MESSAGES]
Return JSON: {"preferences": [{"tag":"cheap","weight":4}, ...], "notes": "one-sentence summary"}

Embedding + Vector Memory

Store user/session embeddings to maintain taste without leaking a user’s raw text to production LLMs. Workflow:

From the summarization JSON, create a compact preference vector via an embedding model.
Upsert to your vector DB keyed by session or group id.
At recommendation time, query nearest restaurant embeddings and return top-K for RAG context.

// Pseudocode: upsert preference embedding
const prefText = 'cheap, spicy, outdoor seating';
const embedding = await embeddingsClient.create({model: 'text-embedding-002', input: prefText});
vectorDB.upsert({id: sessionId, vector: embedding.data[0].embedding, metadata: {...}});

Node.js orchestrator pattern (simplified)

Key points: keep model prompts small; orchestrate steps server-side; use function calling for map/reservation actions; implement retries and backoff.

import Fastify from 'fastify';
import OpenAI from 'openai';
import Anthropic from 'anthropic';

const app = Fastify();
const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });
const anthropic = new Anthropic.Client({ apiKey: process.env.ANTHROPIC_KEY });

app.post('/recommend', async (req, reply) => {
  const { sessionId } = req.body;
  const prefDoc = await vectorDB.query(sessionId); // get RAG context
  const prompt = composePrompt(prefDoc, restaurantMeta);

  // Call ChatGPT (function calling enabled)
  const res = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: 'You are a recommendation engine...' },
      { role: 'user', content: prompt }
    ],
    functions: [ /* define function schema for actions */ ],
    temperature: 0.2,
    max_tokens: 600
  });

  const recommendations = parseResponse(res);
  reply.send({ recommendations });
});

app.listen({ port: 3000 });

Function calling: mapping recommendations to actions

When the assistant suggests a restaurant and indicates reserve: true, call an external reservations API (OpenTable, or a mock). Rely on function schemas to ensure safe, typed outputs. Avoid sending raw user tokens; the orchestrator should be the only component with API keys.

Testing prompts and preventing hallucinations

Always include exact fields and validation schemas in prompts—ask models to output only JSON.
Implement strict JSON parsing and schema validation (e.g., Zod, Joi) after model responses.
Use deterministic settings (low temperature) for rank/order tasks; use higher temperature for playful descriptions.

Privacy, data and safety in 2026

Micro-apps commonly run with personal data. Follow these rules:

Mask PII before sending to models where possible; keep identity tokens localized.
Prefer embedding-only memory for preference profiles; flush ephemeral session context regularly.
Log prompts and responses for debugging, but redact sensitive fields and store telemetry separately from raw user data.

Deployment tips: fast, affordable, and compliant

For a micro-app aimed at friends or a small user base, prefer serverless and edge-first deployment for low ops:

Host frontend on Vercel or Netlify (Edge Functions) for instant global delivery.
Use serverless functions (Vercel Edge, AWS Lambda) for orchestrator endpoints to minimize idle costs.
Vector DBs: managed Pinecone or Weaviate to avoid infra-heavy maintenance. For on-prem or strict compliance, Milvus in a small k8s cluster works well.
Cache frequently used restaurant metadata in Redis to reduce API bill for Places lookups.

Latency and cost control

Keep context trimmed—don’t send full chat history, only distilled preferences and top-K restaurants.
Preferred flow: summarize via Claude (cheap for short text), embed + RAG, call ChatGPT only on the distilled payload.
Batch embeddings when updating preferences to reduce calls.
Design for low-latency user interactions where streaming UI matters.

Metrics to track (prove ROI)

Even a personal micro-app benefits from observability. Track:

Decision time: average time from session creation to accepted suggestion — the primary UX ROI.
Acceptance rate: percent of recommendations accepted or acted on.
LLM cost per session: model tokens + embedding + external API costs.
Error / hallucination rate: JSON validation failures or bad links returned.

Edge-case handling & QA

Missing data: have fallback rules (e.g., use popularity score from Places API)
Conflicting preferences: surface trade-offs and ask a clarifying question rather than guessing
Rate limits: implement circuit breakers and degrade gracefully with cached results
Accessibility: provide plain-text and ARIA-friendly UI states for recommendations

UX patterns that made Where2Eat work (and you should copy)

Shareable session links with short TTL—no signup friction increases adoption.
Live voting UI that immediately updates the vector memory—users see influence in real-time.
Explainable reasons—each recommendation includes a one-line rationale to build trust.
Lightweight actions—open map or copy address instead of forcing complex reservations unless explicitly requested.

Sample end-to-end sequence

Group creates session. Client collects individual preferences (checkboxes + quick text) and sends to /session/create.
Server runs Claude summarization on free-texts; builds preference document and stores embedding in vector DB.
When host clicks "Recommend", the server queries vector DB for top-K candidates, composes the ChatGPT prompt, and requests function-calling output.
Server validates JSON, enriches with map URLs, and returns recommendations to the client; client renders and allows accept/decline.
Accepted choice triggers reservation if enabled, and logs metrics.

Developer checklist

Set up API keys and environments for both OpenAI and Anthropic (or equivalents).
Pick a vector DB and model for embeddings; seed with your local restaurants metadata.
Create robust prompt templates with clear JSON schemas; write unit tests for parser logic.
Implement telemetry: prompt ID, tokens used, response validation rate, and user metrics.
Plan for privacy: embed-only memory, redaction, and TTL-based memory eviction.

Advanced extensions (2026-forward)

On-device LLMs: For privacy-first versions, run a smaller LLM locally for personalization while keeping heavier models in the cloud for global reasoning.
Hybrid search: combine vector similarity with business rules (dietary restrictions, price limits) enforced by the orchestrator.
Adaptive model routing: automatically route creative tasks to Claude-family and deterministic tasks to ChatGPT based on request taxonomy.
Prompt templates marketplace: share tuned prompt-and-schema pairs across teams to accelerate new micro-app builds.

Real-world example: sample prompt sequence (concise)

Below is a simplified flow used in productionized micro-apps:

Collect free-text opinions & checkbox prefs on client.
Send to /summarize — Claude returns {preferences, notes}.
Embed preferences & upsert to vector DB.
When recommending: query top 20 restaurants by vector similarity, send to ChatGPT with schema asking for top 5.
Parse and validate; enrich with map links; return to client.

Common pitfalls and how to avoid them

Sending full chat logs to the LLM — distill first to reduce cost and improve accuracy.
Not validating model outputs — always validate and fallback to safe defaults.
Ignoring token costs — monitor and cap model selection per session.
Over-automating reservations — always confirm with users before committing external actions.

Conclusion: Recreating Where2Eat — fast and practical

Rebecca Yu’s Where2Eat demonstrates the value of tiny, pragmatic apps that solve a real pain. Recreating it with modern 2026 practices—multi-model orchestration, RAG, embeddings, and function calling—lets you ship a reliable, explainable dining assistant in days, not months. The architecture above balances cost, accuracy, and privacy with developer ergonomics.

“Once vibe-coding apps emerged, people with no tech backgrounds were building apps — this approach borrows that speed while adding production-grade guardrails.”

Actionable next steps (do this in the next 48 hours)

Clone a minimal template: React + serverless Fastify endpoint.
Wire one LLM (ChatGPT) with a function-call mock and produce a JSON recommendation test.
Add a summarization step (Claude) and wire an embedding + vector DB for a single session.
Deploy to Vercel/Netlify and test with 2–4 friends. Measure decision time and acceptance rate.

Call to action

Ready to reproduce Where2Eat for your team or clients? Start with the prompts above and a single function-calling endpoint. If you want a ready-to-deploy starter kit with prompt templates, test harnesses, and a cost model calculator tailored to your organization size, request our micro-app starter repo and deployment checklist — built for engineers and IT admins who need predictable results.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.