Non-deterministic LLM outputs
LLM responses vary between runs, making traditional unit testing insufficient. I needed an evaluation flow focused on structure, latency, and token usage — not exact string matching.
PromptIQ
A full-stack prompt engineering workspace for organizing, testing, and refining AI prompts with structured evaluation.
Large language models are powerful, but their output quality depends heavily on prompt design. For developers and power users building AI-assisted workflows, prompts often end up scattered across chat threads, notes apps, and one-off documents — with no versioning, no structured testing, and no way to compare what changed between iterations.
The result is wasted time: re-prompting from scratch, inconsistent outputs, and no repeatable process for improving prompt quality. PromptIQ was built to solve that operational gap — treating prompts as engineered artifacts, not disposable messages.
PromptIQ started as a capstone-style project during my software engineering coursework at Arizona State University, building on the full-stack foundation I developed through the Microsoft Software and Systems Academy (MSSA).
I wanted a project that demonstrated modern AI integration patterns while still showing classic software engineering discipline: typed data models, server-side security, clear UX, and maintainable architecture. Prompt engineering was the domain — but the engineering showcase is in how the system is structured.
LLM responses vary between runs, making traditional unit testing insufficient. I needed an evaluation flow focused on structure, latency, and token usage — not exact string matching.
Provider credentials must never reach the client bundle. All LLM calls route through server-side handlers with environment-based configuration and input validation.
Users need to iterate on prompts without losing prior versions. I designed a version chain model that keeps history accessible without a full document-management system.
The core workflow — edit, run, compare, save — must feel fast on desktop and usable on tablet. Layout and loading states were designed around this loop.
PromptIQ is a full-stack web application that centralizes prompt management and testing. Users create prompts in a structured library, run them against configured LLM providers, and compare outputs across versions — turning ad-hoc experimentation into a repeatable workflow.
The application separates concerns cleanly: the client handles editing and display, the server handles validation and provider communication, and the database persists prompts, versions, and run history.
A Next.js full-stack application with server-side LLM orchestration, a PostgreSQL persistence layer, and a component-driven React frontend.
Mapped user workflow, designed PostgreSQL schema, defined API contracts
Built server-side LLM integration, prompt CRUD, and version chain logic
Implemented library, editor, test panel, and comparison views
Error handling, input validation, deployment, documentation, and polish
The most-used path is edit → test → compare. Optimizing navigation and loading states for that loop mattered more than adding secondary features.
Versioning, metadata, and categorization requirements pushed me to model prompts as structured entities early — saving significant refactoring later.
Server-side API handling from day one prevented a common portfolio mistake: client-exposed keys that would block any real deployment.
A focused prompt library and testing tool shipped faster and reads more credibly than a generic 'AI platform' with half-built features.
PromptIQ demonstrates end-to-end ownership of a modern AI-enabled web application — from database schema and server-side provider integration to responsive UI and deployment.
Metrics below reflect technical outcomes and system capabilities. As a portfolio and capstone project, the focus is on engineering quality and architectural clarity rather than commercial user counts.
100%
All LLM provider credentials remain in server environment — zero client exposure
<2s
Optimized loading states and async server actions for responsive iteration
6
Library, versioning, testing, comparison, search, and run history
12 wks
Structured milestone delivery from schema design through Vercel deployment