Version Control and Testing

Master version control in PromptOwl - test drafts, compare versions with evaluation sets, publish safely, and roll back when needed.

Learn how to manage agent versions in PromptOwl. This guide covers the complete workflow from drafting changes to publishing safely, with testing and rollback strategies.


Why Version Control Matters

When you're iterating on AI agents, you need:

  • Safety: Don't break production while experimenting

  • Testing: Verify changes before they go live

  • History: Track what changed and when

  • Rollback: Quickly revert if something goes wrong

PromptOwl's version system gives you all of this.


Understanding Versions

Version States

State
Icon
Meaning

Draft

Gray

Work-in-progress, not live

Published

Green

Active version users see

Historical

None

Previous versions in history

What Gets Versioned

Every version captures:

  • System prompt content

  • Block configurations

  • Model settings (provider, temperature, tokens)

  • Connected datasets

  • Tool selections

  • Variable definitions

What Doesn't Get Versioned

These are separate from versions:

  • Conversations (tied to prompt, not version)

  • Evaluation sets (independent)

  • Annotations (on conversations)

  • API keys (account-level)


The Version Workflow


Step 1: Create a Draft

Making Changes

  1. Open your agent in the editor

  2. Make your changes (prompt, settings, etc.)

  3. Click Save to create a draft

Your changes are now saved but NOT live. Users still see the published version.

Draft Indicators

Look for these signs you're working on a draft:

  • "Draft" badge in the editor

  • "Unsaved changes" warning if you navigate away

  • Version number shows "(draft)" suffix

Multiple Drafts

You can only have one active draft at a time. Each save overwrites the previous draft until you publish.


Step 2: Test Your Changes

Using the Chat Interface

Before publishing, test your draft:

  1. Open the Chat tab while in draft mode

  2. The chat uses your draft version (not published)

  3. Test with real questions

  4. Verify responses are correct

Test Checklist

For each change, verify:

Testing Tips

For prompt changes:

For model changes:

For RAG changes:


Step 3: Evaluate with Test Sets

Why Evaluate?

Manual testing catches obvious issues. Evaluation sets catch systematic problems across many scenarios.

Creating an Evaluation Set

  1. Go to the Evaluate tab

  2. Click Create Evaluation Set

  3. Add test cases:

Input
Expected Behavior

"What's your return policy?"

References return policy document

"How much does it cost?"

Mentions pricing tiers

"I'm frustrated with your service"

Responds empathetically

Running Evaluations

  1. With your draft active, go to Evaluate

  2. Select your evaluation set

  3. Click Run Evaluation

  4. Review pass/fail results

Comparing Version Performance

Run the same evaluation set on:

  1. Your current published version

  2. Your draft version

Compare the results:

Version
Pass Rate
Notes

v3 (Published)

85%

Current baseline

v4 (Draft)

92%

Improvement on pricing questions

Only publish if the draft performs equal to or better than the current version.

Using AI Judge

For subjective quality, configure AI Judge:

  1. In evaluation settings, enable AI Judge

  2. Set scoring criteria:

    • Accuracy (1-5)

    • Helpfulness (1-5)

    • Tone (1-5)

  3. Run evaluation with AI scoring

  4. Review aggregate scores


Step 4: Publish Safely

Pre-Publish Checklist

Before clicking Publish:

Publishing

  1. Click Publish in the editor

  2. Confirm the action

  3. Your draft becomes the new published version

What Happens on Publish

  • Draft becomes the active version

  • Previous published version moves to history

  • All new conversations use the new version

  • Existing conversations continue with their original version

  • API consumers immediately get the new version

Gradual Rollout (Advanced)

For high-traffic agents, consider:

  1. Publish at low-traffic times - Fewer users affected if issues arise

  2. Monitor closely after publish - Watch for problems in the first hour

  3. Have rollback ready - Know which version to revert to


Step 5: Monitor After Publishing

What to Watch

After publishing, monitor:

Metric
Where to Find
Warning Sign

Error rate

Conversations

Sudden increase

User feedback

Annotations

Negative sentiment spike

Response quality

Sample conversations

Unexpected responses

Token usage

Analytics

Unusual increase/decrease

Setting Up Monitoring

  1. Go to Monitor tab

  2. Filter to recent conversations

  3. Review a sample of responses

  4. Check for annotation patterns

How Long to Monitor

Agent Type
Monitoring Period

Low traffic (<100/day)

24-48 hours

Medium traffic

4-8 hours

High traffic (>1000/day)

1-2 hours


Step 6: Rollback When Needed

When to Rollback

Rollback immediately if you see:

  • Systematic errors in responses

  • Spike in negative annotations

  • Critical functionality broken

  • Compliance or safety issues

How to Rollback

  1. Go to Versions panel (right sidebar)

  2. Find the last known good version

  3. Click on it to preview

  4. Click Publish on that version

  5. Confirm the rollback

Rollback is Safe

  • Creates a new version (doesn't delete history)

  • Instant effect on new conversations

  • Existing conversations unaffected

  • You can always roll forward again

Post-Rollback Actions

After rolling back:

  1. Document the issue - What went wrong?

  2. Analyze the failed version - Why did it fail?

  3. Fix in a new draft - Address the root cause

  4. Re-test thoroughly - Don't repeat the mistake

  5. Try again - When ready, publish the fixed version


Version History Best Practices

Meaningful Changes

Make each version meaningful:

Good:

  • v1: Initial release

  • v2: Added product documentation RAG

  • v3: Improved handling of refund requests

  • v4: Updated to Claude 3.5 Sonnet

Bad:

  • v1: Initial

  • v2: Fixed typo

  • v3: Testing

  • v4: Testing again

  • v5: Final

  • v6: Actually final

Version Notes

When saving/publishing, include notes about:

  • What changed

  • Why it changed

  • Expected impact

Regular Cleanup

Periodically review your version history:

  • Identify which versions were successful

  • Note patterns in what worked/didn't work

  • Use insights for future changes


Team Workflows

Review Before Publish

For team environments:

  1. Developer creates draft and tests locally

  2. Reviewer checks changes and runs evaluations

  3. Approver gives go-ahead to publish

  4. Publisher makes version live

Avoiding Conflicts

When multiple people edit:

  • Only one draft exists at a time

  • Last save wins

  • Communicate about who's editing

  • Use the Versions panel to see recent changes

Shared Evaluation Sets

Create evaluation sets that the whole team uses:

  • Standard test cases everyone runs

  • Ensures consistent quality bar

  • Makes comparisons meaningful


API Consumer Considerations

How API Users Experience Versions

When you publish a new version:

  • All API calls immediately use the new version

  • No API changes needed on consumer side

  • Conversation history continues normally

Versioning for API Stability

If API consumers need stability:

  1. Communicate changes - Notify before major updates

  2. Test with staging - Use a separate staging agent

  3. Gradual rollout - Publish during low-traffic periods

  4. Rollback plan - Have previous version ready

Breaking Changes

These changes may affect API consumers:

  • Different response format

  • Changed variable requirements

  • New required inputs

  • Significantly different behavior

Communicate these before publishing.


Troubleshooting

Draft Not Saving

  1. Check internet connection

  2. Verify you have edit permissions

  3. Try refreshing the page

  4. Check for validation errors

Can't Publish

  1. Ensure you have owner/editor role

  2. Check for required fields

  3. Verify no validation errors

  4. Try saving draft first, then publish

Wrong Version Active

  1. Go to Versions panel

  2. Verify which version shows "Published"

  3. If wrong, publish the correct version

  4. Check for recent changes by others

Rollback Failed

  1. Refresh the page

  2. Try again

  3. If persistent, contact support

  4. Document the issue


Quick Reference

Version Workflow

Key Actions

Action
Effect

Save

Creates/updates draft

Publish

Draft → Live, old → History

Rollback

Historical → Live (via Publish)

Discard

Removes current draft

Safety Rules

  1. Always test before publishing

  2. Run evaluation sets for major changes

  3. Monitor after publishing

  4. Keep rollback version identified

  5. Don't publish during peak hours


Learn More


Ready to manage versions like a pro? Get started with PromptOwlarrow-up-right.

Last updated