JustJosh.dev logoJustJosh.dev
Featured case studyCompleted

PromptIQ

From scattered prompts to a repeatable AI workflow

A full-stack prompt engineering workspace for organizing, testing, and refining AI prompts with structured evaluation.

Role
Solo full-stack developer
Timeline
12 weeks
Stack
Next.js · TypeScript · React

Prompts don't scale when they live in chat history

Large language models are powerful, but their output quality depends heavily on prompt design. For developers and power users building AI-assisted workflows, prompts often end up scattered across chat threads, notes apps, and one-off documents — with no versioning, no structured testing, and no way to compare what changed between iterations.

The result is wasted time: re-prompting from scratch, inconsistent outputs, and no repeatable process for improving prompt quality. PromptIQ was built to solve that operational gap — treating prompts as engineered artifacts, not disposable messages.

Why I built this

PromptIQ started as a capstone-style project during my software engineering coursework at Arizona State University, building on the full-stack foundation I developed through the Microsoft Software and Systems Academy (MSSA).

I wanted a project that demonstrated modern AI integration patterns while still showing classic software engineering discipline: typed data models, server-side security, clear UX, and maintainable architecture. Prompt engineering was the domain — but the engineering showcase is in how the system is structured.

Goals

  • Give users a structured library to store, version, and categorize prompts
  • Enable side-by-side prompt testing against LLM responses
  • Keep all API keys and provider calls strictly server-side
  • Design a UI that makes iteration fast — edit, test, compare, save
  • Ship a deployable MVP that demonstrates production-minded patterns
  • Document architecture decisions clearly enough for team handoff

Technical challenges

Non-deterministic LLM outputs

LLM responses vary between runs, making traditional unit testing insufficient. I needed an evaluation flow focused on structure, latency, and token usage — not exact string matching.

Secure API key handling

Provider credentials must never reach the client bundle. All LLM calls route through server-side handlers with environment-based configuration and input validation.

Prompt versioning without complexity

Users need to iterate on prompts without losing prior versions. I designed a version chain model that keeps history accessible without a full document-management system.

Responsive UX for iteration loops

The core workflow — edit, run, compare, save — must feel fast on desktop and usable on tablet. Layout and loading states were designed around this loop.

A prompt workspace built for iteration

PromptIQ is a full-stack web application that centralizes prompt management and testing. Users create prompts in a structured library, run them against configured LLM providers, and compare outputs across versions — turning ad-hoc experimentation into a repeatable workflow.

The application separates concerns cleanly: the client handles editing and display, the server handles validation and provider communication, and the database persists prompts, versions, and run history.

Key features

  • Structured prompt library with categories and search
  • Version history with diff-friendly iteration tracking
  • One-click test runs against configured LLM providers
  • Side-by-side output comparison across prompt versions
  • Server-side API integration with typed request/response contracts
  • Responsive UI optimized for the edit → test → refine loop

System architecture

A Next.js full-stack application with server-side LLM orchestration, a PostgreSQL persistence layer, and a component-driven React frontend.

Data flow

  1. 1User edits prompt in the client editor
  2. 2Server Action validates input with Zod
  3. 3Application layer calls LLM provider with server-side API key
  4. 4Response metadata (latency, tokens) stored with run record
  5. 5Updated prompt version persisted to PostgreSQL
  6. 6Client renders comparison view with version history

How it was built

  • Defined user workflow: library → edit → test → compare → save
  • Designed database schema for prompts, versions, and run history
  • Built server-side LLM integration with validation and error handling
  • Implemented core UI: library view, editor, test panel, comparison view
  • Added versioning, search, and iteration tracking
  • Hardened security: env-only secrets, input sanitization, rate awareness
  • Deployed to Vercel with environment configuration and smoke testing

Development timeline

  1. Discovery & Schema

    Weeks 1–2

    Mapped user workflow, designed PostgreSQL schema, defined API contracts

    • Entity-relationship model
    • Zod validation schemas
    • Workflow wireframes
  2. Core Backend

    Weeks 3–5

    Built server-side LLM integration, prompt CRUD, and version chain logic

    • Server Actions for prompt operations
    • Secure provider integration
    • Run history storage
  3. Frontend MVP

    Weeks 6–8

    Implemented library, editor, test panel, and comparison views

    • Responsive UI
    • Edit-test-save loop
    • Version comparison view
  4. Hardening & Deploy

    Weeks 9–12

    Error handling, input validation, deployment, documentation, and polish

    • Vercel deployment
    • Environment configuration
    • README and architecture docs

Key technical decisions

Server Actions over client-side API calls

Decision
Route all LLM requests through Next.js Server Actions
Rationale
Keeps provider API keys off the client entirely and leverages built-in CSRF protections. Validation runs once on the server before any external call.
Tradeoff
Slightly more complex debugging than a standalone API, but significantly better security posture for a portfolio MVP.

PostgreSQL for prompt persistence

Decision
Use a relational database instead of file-based or browser storage
Rationale
Prompts, versions, and run history have relational structure. PostgreSQL supports querying, indexing, and future multi-user expansion without a rewrite.
Tradeoff
Adds deployment complexity vs. local storage, but demonstrates real full-stack data modeling.

Version chains over inline overwrite

Decision
Store each prompt edit as a new version linked to its parent
Rationale
Users experimenting with prompts need rollback and comparison. Immutable versions make the comparison view straightforward and auditable.
Tradeoff
More storage per prompt, but essential for the core value proposition.

Structured evaluation over exact-match testing

Decision
Evaluate runs on response metadata and structural checks, not exact output matching
Rationale
LLM outputs are non-deterministic. Measuring latency, token count, and response format is more reliable than asserting identical text.
Tradeoff
Less suitable for regression testing of exact copy, but honest to how LLMs behave in production.

Technologies used

  • Next.js
  • TypeScript
  • React
  • Tailwind CSS
  • PostgreSQL
  • OpenAI API
  • Zod
  • Vercel

Lessons learned

Design for the iteration loop first

The most-used path is edit → test → compare. Optimizing navigation and loading states for that loop mattered more than adding secondary features.

Treat prompts as data, not strings

Versioning, metadata, and categorization requirements pushed me to model prompts as structured entities early — saving significant refactoring later.

Security is a feature, not a phase

Server-side API handling from day one prevented a common portfolio mistake: client-exposed keys that would block any real deployment.

Scope AI features tightly

A focused prompt library and testing tool shipped faster and reads more credibly than a generic 'AI platform' with half-built features.

Outcomes

PromptIQ demonstrates end-to-end ownership of a modern AI-enabled web application — from database schema and server-side provider integration to responsive UI and deployment.

Metrics below reflect technical outcomes and system capabilities. As a portfolio and capstone project, the focus is on engineering quality and architectural clarity rather than commercial user counts.

100%

Server-side API keys

All LLM provider credentials remain in server environment — zero client exposure

<2s

Typical test run feedback

Optimized loading states and async server actions for responsive iteration

6

Core workflow features

Library, versioning, testing, comparison, search, and run history

12 wks

Concept to deployment

Structured milestone delivery from schema design through Vercel deployment

Future roadmap

  • Multi-provider support (Anthropic, Azure OpenAI) with unified interface
  • Team workspaces with shared prompt libraries
  • Automated evaluation rubrics with structured JSON scoring
  • Export/import for prompt templates and version history
  • Usage analytics dashboard for token and cost tracking
  • Role-based access control for multi-user deployments