AI Evaluation Prompts

Copy/paste prompts for testing whether a fresh AI agent can discover and use sn safely.

Use these prompts with a fresh AI coding agent in a durable project workspace where sn is installed and an instance target is registered. Give the agent the customer-testing preface below, then one prompt at a time. Start each independent prompt in a fresh durable project workspace; only reuse a workspace when a prompt explicitly depends on previous work.

Evaluated-Agent Preface

You are testing the installed ServiceNow CLI as a customer would use it. Work in a fresh durable project workspace for this prompt unless I explicitly say it depends on a previous prompt. Keep reviewable artifacts in repo paths such as specs/, manifests/, samples/, reports/, and evidence/; use .tmp only for disposable scratch. Discover how to initialize and use the product from the installed CLI, generated project files, and normal help output. Do not inspect the product source repository or ask for hidden internal instructions. Build reviewable local artifacts, make live changes only when you have enough confidence, prove the requested business behavior, and leave a concise report under reports/customer-runs/<prompt-id>/ with the evidence and any unresolved gaps.

Do not add extra instructions about lifecycle commands, hygiene, patches, drift, state, ATF, validation, testing, preflight, update sets, or internal implementation mechanics. The point is to see whether the agent discovers initialization and lifecycle behavior from normal product surfaces.

Prompt Group	What It Exercises
Catalog Fulfillment	catalog request setup, approvals, fulfillment work, and request evidence
Controlled Iteration And Drift	existing work repair, source-modeled iteration, real drift detection, and pull evidence
Data And Integration Repair	import coalescing, feed repair, reusable integrations, and secret handling
Access And Automation	ACL allow/deny proof, record-trigger boundaries, audit evidence, and negative cases
Discovery And Breadth	capture/source control, safe improvement, breadth handling, and staged delivery

Catalog Fulfillment

Exercises catalog request setup, approvals, fulfillment work, and request evidence.

Employee Equipment Fulfillment With Approval

Employee Equipment Fulfillment With Approval

Build an employee equipment request item in ServiceNow. The requester should choose laptop type, peripherals, needed-by date, delivery location, and business justification. Expensive requests should go through approval before the IT hardware team gets fulfillment work. Give me evidence that a realistic request can be submitted and that the right follow-up work is created.

Finance Analytics Access Request

Finance Analytics Access Request

Create a Service Catalog request for access to our finance analytics tool. It should collect manager, cost center, access level, and justification. The request should go to the manager first, then to the finance data owner, and only then create work for the access team. Show that the request path behaves correctly.

Controlled Iteration And Drift

Exercises existing work repair, source-modeled iteration, real drift detection, and pull evidence.

Finance Access Route Change

Finance Access Route Change

Use the finance analytics access request from the previous prompt. The finance team changed the approval wording and wants admin-level access to go through a separate data owner before fulfillment. Update the existing work in a controlled way without deleting and rebuilding the request item. Show what changed and prove the updated route works.

Drifted Customer Copy

Drifted Customer Copy

Assume one of the request artifacts from the previous work was edited directly on the ServiceNow instance by an admin who changed customer-facing copy. Help me find what changed, show me the local value versus the live value, and either restore the intended source value or explain how to keep the live edit in source control.

Data And Integration Repair

Exercises import coalescing, feed repair, reusable integrations, and secret handling.

Office Location Import With Feed Change

Office Location Import With Feed Change

Create a small import process for office locations. The incoming rows have a location code, name, region, manager email, and active flag. It should update an existing office when the location code already exists. Also account for the fact that the source feed recently renamed the incoming code column, and show that a rerun updates existing offices instead of creating duplicates.

Vendor Status Integration Repair

Vendor Status Integration Repair

We have an outbound vendor status integration that moved to a new base URL and now uses a different header name for its API key. Help me update the ServiceNow setup in a controlled way, keep secrets out of source files, and show that a reusable action or flow can still check one vendor by ID.

Access And Automation

Exercises ACL allow/deny proof, record-trigger boundaries, audit evidence, and negative cases.

Support Coordinator Access

Support Coordinator Access

Create a custom support queue table and a support coordinator role. Coordinators should be able to read and update queue records. Normal users should not be able to change those records. Include proof that the allowed and denied access paths behave as intended.

Critical Incident Follow-Up

Critical Incident Follow-Up

Build an automation for new critical incidents. When a critical incident is created, it should create follow-up work for the major incident team and record what it did somewhere an admin can review later. Give me evidence that the automation only affects critical incidents.

Discovery And Breadth

Exercises capture/source control, safe improvement, breadth handling, and staged delivery.

Bring An Existing Customization Under Control

Bring An Existing Customization Under Control

I already have a small ServiceNow customization in my test instance, but I do not know exactly how it was built. Help me bring it under source control, explain what it does, and make one safe improvement that I can review before it changes live behavior.

New Hire Workspace Bundle

New Hire Workspace Bundle

Create a new hire workspace bundle request. The requester should provide employee type, start date, department, shipping address, and whether a phone is needed. The process should coordinate the right follow-up work for facilities and IT without turning into a one-off script that is hard to review.

Evaluator Notes

Do not paste this rubric into the agent being evaluated. Good runs usually initialize an isolated workspace, discover available sn commands before guessing syntax, create source-controlled artifacts, use supported manifest operations, keep work scoped, add realistic evidence, run fresh checks, preserve target/profile context, and leave a concise transcript with paths, command results, created records, tests, drift/state results, and blockers.

High-signal failures include ad hoc REST mutation when a supported workflow exists, reporting success without fresh command evidence, skipping validation or convergence checks after mutation, hard-coding secrets or sys_ids into public artifacts, losing target context during repair, and rewriting existing customizations when the request asked for controlled iteration.