Featured case studyCompleted

PromptIQ

From scattered prompts to a repeatable AI workflow

A full-stack prompt engineering workspace for organizing, testing, and refining AI prompts with structured evaluation.

Role: Solo full-stack developer
Timeline: 12 weeks
Stack: Next.js · TypeScript · React

Prompts don't scale when they live in chat history

Large language models are powerful, but their output quality depends heavily on prompt design. For developers and power users building AI-assisted workflows, prompts often end up scattered across chat threads, notes apps, and one-off documents — with no versioning, no structured testing, and no way to compare what changed between iterations.

The result is wasted time: re-prompting from scratch, inconsistent outputs, and no repeatable process for improving prompt quality. PromptIQ was built to solve that operational gap — treating prompts as engineered artifacts, not disposable messages.

Why I built this

PromptIQ started as a capstone-style project during my software engineering coursework at Arizona State University, building on the full-stack foundation I developed through the Microsoft Software and Systems Academy (MSSA).

I wanted a project that demonstrated modern AI integration patterns while still showing classic software engineering discipline: typed data models, server-side security, clear UX, and maintainable architecture. Prompt engineering was the domain — but the engineering showcase is in how the system is structured.

Goals

Give users a structured library to store, version, and categorize prompts
Enable side-by-side prompt testing against LLM responses
Keep all API keys and provider calls strictly server-side
Design a UI that makes iteration fast — edit, test, compare, save
Ship a deployable MVP that demonstrates production-minded patterns
Document architecture decisions clearly enough for team handoff

Technical challenges

Non-deterministic LLM outputs

LLM responses vary between runs, making traditional unit testing insufficient. I needed an evaluation flow focused on structure, latency, and token usage — not exact string matching.

Secure API key handling

Provider credentials must never reach the client bundle. All LLM calls route through server-side handlers with environment-based configuration and input validation.

Prompt versioning without complexity

Users need to iterate on prompts without losing prior versions. I designed a version chain model that keeps history accessible without a full document-management system.

Responsive UX for iteration loops

The core workflow — edit, run, compare, save — must feel fast on desktop and usable on tablet. Layout and loading states were designed around this loop.

A prompt workspace built for iteration

PromptIQ is a full-stack web application that centralizes prompt management and testing. Users create prompts in a structured library, run them against configured LLM providers, and compare outputs across versions — turning ad-hoc experimentation into a repeatable workflow.

The application separates concerns cleanly: the client handles editing and display, the server handles validation and provider communication, and the database persists prompts, versions, and run history.

Key features

Structured prompt library with categories and search
Version history with diff-friendly iteration tracking
One-click test runs against configured LLM providers
Side-by-side output comparison across prompt versions
Server-side API integration with typed request/response contracts
Responsive UI optimized for the edit → test → refine loop

System architecture

A Next.js full-stack application with server-side LLM orchestration, a PostgreSQL persistence layer, and a component-driven React frontend.

Presentation

Next.js App Router
React Server Components
Tailwind CSS
Client forms & test UI

Application

Server Actions
Route handlers
Zod validation
Prompt run orchestration

Data

PostgreSQL
Prompt & version schema
Run history records
Typed query layer

External

OpenAI-compatible API
Environment-based secrets
Vercel deployment

Data flow

1User edits prompt in the client editor
2Server Action validates input with Zod
3Application layer calls LLM provider with server-side API key
4Response metadata (latency, tokens) stored with run record
5Updated prompt version persisted to PostgreSQL
6Client renders comparison view with version history

How it was built

Defined user workflow: library → edit → test → compare → save
Designed database schema for prompts, versions, and run history
Built server-side LLM integration with validation and error handling
Implemented core UI: library view, editor, test panel, comparison view
Added versioning, search, and iteration tracking
Hardened security: env-only secrets, input sanitization, rate awareness
Deployed to Vercel with environment configuration and smoke testing

Development timeline

1
Discovery & Schema
Weeks 1–2
Mapped user workflow, designed PostgreSQL schema, defined API contracts
- Entity-relationship model
- Zod validation schemas
- Workflow wireframes
2
Core Backend
Weeks 3–5
Built server-side LLM integration, prompt CRUD, and version chain logic
- Server Actions for prompt operations
- Secure provider integration
- Run history storage
3
Frontend MVP
Weeks 6–8
Implemented library, editor, test panel, and comparison views
- Responsive UI
- Edit-test-save loop
- Version comparison view
4
Hardening & Deploy
Weeks 9–12
Error handling, input validation, deployment, documentation, and polish
- Vercel deployment
- Environment configuration
- README and architecture docs

Key technical decisions

Server Actions over client-side API calls

Decision: Route all LLM requests through Next.js Server Actions
Rationale: Keeps provider API keys off the client entirely and leverages built-in CSRF protections. Validation runs once on the server before any external call.
Tradeoff: Slightly more complex debugging than a standalone API, but significantly better security posture for a portfolio MVP.

PostgreSQL for prompt persistence

Decision: Use a relational database instead of file-based or browser storage
Rationale: Prompts, versions, and run history have relational structure. PostgreSQL supports querying, indexing, and future multi-user expansion without a rewrite.
Tradeoff: Adds deployment complexity vs. local storage, but demonstrates real full-stack data modeling.

Version chains over inline overwrite

Decision: Store each prompt edit as a new version linked to its parent
Rationale: Users experimenting with prompts need rollback and comparison. Immutable versions make the comparison view straightforward and auditable.
Tradeoff: More storage per prompt, but essential for the core value proposition.

Structured evaluation over exact-match testing

Decision: Evaluate runs on response metadata and structural checks, not exact output matching
Rationale: LLM outputs are non-deterministic. Measuring latency, token count, and response format is more reliable than asserting identical text.
Tradeoff: Less suitable for regression testing of exact copy, but honest to how LLMs behave in production.

Technologies used

Next.js
TypeScript
React
Tailwind CSS
PostgreSQL
OpenAI API
Zod
Vercel

Lessons learned

Design for the iteration loop first

The most-used path is edit → test → compare. Optimizing navigation and loading states for that loop mattered more than adding secondary features.

Treat prompts as data, not strings

Versioning, metadata, and categorization requirements pushed me to model prompts as structured entities early — saving significant refactoring later.

Security is a feature, not a phase

Server-side API handling from day one prevented a common portfolio mistake: client-exposed keys that would block any real deployment.

Scope AI features tightly

A focused prompt library and testing tool shipped faster and reads more credibly than a generic 'AI platform' with half-built features.

Outcomes

PromptIQ demonstrates end-to-end ownership of a modern AI-enabled web application — from database schema and server-side provider integration to responsive UI and deployment.

Metrics below reflect technical outcomes and system capabilities. As a portfolio and capstone project, the focus is on engineering quality and architectural clarity rather than commercial user counts.

100%

Server-side API keys

All LLM provider credentials remain in server environment — zero client exposure

<2s

Typical test run feedback

Optimized loading states and async server actions for responsive iteration

Core workflow features

Library, versioning, testing, comparison, search, and run history

12 wks

Concept to deployment

Structured milestone delivery from schema design through Vercel deployment

Future roadmap

Multi-provider support (Anthropic, Azure OpenAI) with unified interface
Team workspaces with shared prompt libraries
Automated evaluation rubrics with structured JSON scoring
Export/import for prompt templates and version history
Usage analytics dashboard for token and cost tracking
Role-based access control for multi-user deployments