Build Your Own Agent Workshop

If you’re here for the workshop, welcome! Add me on LinkedIn.

This is the build-heavy follow-up to the Intro to Agentic AI session. The basic workshop covered concepts; this one is about shipping. (You don’t have to participate in that to apply for this.)

The slides and source code are available under CC BY-SA 4.0 license. Setup guides and reference material are in the intermediate-agentic-ai-resources repo.

You’ll come in with a team, a real problem, and real data, and leave four hours later with a working agent that someone on your team will actually use next week. Expect a 15-minute talk, then heads-down building with hands-on help.

Teams are product-focused by design: each one pairs engineers with the actual users whose problem is being solved. You have to specify your users and your projects when you apply for this workshop.

Before you apply

We know we’re asking a lot for a 4-hour workshop. The bar is high on purpose: every hour spent upfront understanding the problem and lining up real data is an hour we don’t waste in the room with a half-formed brief. We’ll be bringing together a lot of resources for this workshop, so its important we set ourselves up for success.

We’re capping the first cohort at 8 teams. If you apply now and don’t make it in, you go to the front of the queue for the next round, so it’s worth getting your one-pager in even if you’re not 100% sure your project fits.

Projects

A good workshop project is:

Bounded. The minimally useful user story can be demo-ed in 90 seconds and is buildable by 2 engineers in ~2.5 hours of build time.
Real user in the room. At least one team member personally has the problem the agent solves and will be the demo’s first user.
Real data in the room. 5+ representative examples available on workshop day, from a personal laptop, without violating the constraints below.
Internal-facing. No public users, no anonymous access, no autonomous external actions.
Evaluable. You can describe, in one sentence, how you’d tell if the agent is doing a good job. “It feels right” is not evaluable; “matches the expert’s answer on 8 of 10 known cases” is.
Agentic, not just a prompt. The project requires at least one of: tool use, retrieval, multi-step reasoning, sub-agents, or self-critique. A single-shot prompt-and-response chatbot is not a workshop project.

Note that these are application-agnostic. We’ll accept projects from any area.

Disqualifying properties

A project is disqualified if any of the following are true:

Requires data or systems the team can’t access from a personal laptop on workshop day.
Touches PHI, PII, financial records, security credentials, or anything else where a leak would matter.
Output goes to a customer, patient, regulator, or other external party without a human review step.
Requires training or fine-tuning a model. Frontier models with good prompting and tool use only.
Depends on a specific person who isn’t on the team (“we’ll get input from Legal”).
“Build a platform for X.” Platforms are not 3-hour projects.
Makes autonomous changes on external systems.
Is a trivial RAG over a document pile. You can learn that from LinkedIn.

My goal is to help you avoid these pitfalls that make projects balloon quickly out of scope.

Project ideas

Here are some starting points. You don’t have to pick from this list, but a good project will look a lot like one of these:

#	Name	Description
1	Calendar & todo chatbot	Manages calendar events and todo lists conversationally. Resolves conflicts and reasons about priorities.
2	Market & competitor research	Researches a market landscape and specific competitors. Produces structured briefs with sources.
3	Work instruction check & writeup chatbot	Follows along with a technician as they complete a work instruction, prompts them for details and next steps.
4	Document sorter & summarizer	Sorts and summarizes a document corpus for quick lookup. Surfaces the right passage on demand.
5	Patent research bot	Searches patent databases and parses claims for a target area. Surfaces relevant prior art with citations.
6	Research paper triage agent	Pulls related work from PubMed/arXiv/ChemRxiv and ranks by relevance to the user’s project. Produces a “what’s new and why you care” brief.
7	Experiment protocol checker	Reads a draft protocol and flags missing controls, ambiguous steps, safety issues, and SOP deviations. Embedded user = anyone who’s reviewed a junior person’s protocol and wanted to cry.
8	Safety + SDS assistant	Given a planned experiment, pulls SDS sheets for each chemical and summarizes incompatibilities, required PPE, and waste handling. Catches the spicy combinations before they happen.
9	Test protocol drafter from a requirement	Given a design requirement, drafts a bench test protocol with apparatus, procedure, acceptance criteria, sample size, and applicable standards. Backtest against existing protocols the lab has written.
10	Bench test data analyzer	Pick one type of test the lab runs frequently and build a pipeline to run standard analyses, generate plots, and draft a results section.

Minimal technical requirements

Personal laptop, not ITSS-managed. You need to install things without filing a ticket.
- If it’s Windows, install WSL2, uv, and nvm. Use Visual Studio Code.
- If it’s a Mac, install Homebrew or Zerobrew, uv, and nvm. Use Visual Studio Code.
- If it’s Linux, Godspeed to you.
Each person must have a subscription to a frontier model. Active and logged in before the workshop starts.
- Claude Pro and Codex Plus, strongly encouraged.
- Kimi Allegretto, OpenCode Go, Z.AI GLM Lite, etc. acceptable. (Limited support.)
- GitHub Copilot discouraged (poor billing model), Cursor discouraged (lack of direct AI access).
- If you don’t know what to choose, get a month of Claude Pro. My referral code gives you the first week free.
Any API keys or credentials your project needs (internal docs, instrument output folder, etc.) tested and working on workshop day. If you can’t connect to it from your laptop the morning of, you can’t use it.
- Only exception is the calendar, which we can provide a connector for.

Group requirements

Must bring data. At least 5 real examples of whatever your agent will process — protocols, documents, instrument files, questions with known answers, etc. Synthetic examples are fine if real data can’t leave its home system, but they must be representative.
Team of 2-4 people, max 4.
- Minimum 2 engineers with working Python or TypeScript.
- Minimum 1 actual user who would personally use what you build.
- A team member may be either, both, or neither.
Pre-submitted one-pager (see below) reviewed and approved before workshop day.
Stuck for 10 minutes? Raise your hand. No silent struggling.

Application

The application form asks for:

Team name.
2-4 members. For each: name, role (engineer / user / both), and which frontier-model subscription they’re bringing.

You’ll also have to confirm that:

at least 2 engineers with working Python or TypeScript (even at a very basic level is fine),
at least 1 person who would personally use the thing you’re building.
every member, especially non-engineers, have an active frontier-model subscription by workshop day.

Problem explainer

Problem: what’s painful today, one paragraph. Include how often it happens and roughly how much time it eats.
User: who exactly, and which of them is on the team. “Our PMs” is too vague; “Sarah, who is on this team” is right.
Inputs & systems: what data, APIs, or docs the agent needs, and confirmation you’ll have working access from a personal laptop on workshop day.
Happy-path demo: two sentences describing the exact scenario you’ll show at the end.
Success criteria: one sentence describing what counts as success after you leave the workshop. e.g. “Sarah uses it at least once next week without help.”

User stories

3-5 concrete user stories in the form: As [specific person], when [trigger], I want [action] so that [outcome].
1 minimally useful user story marked clearly — this is your build goal for the day.
- If you only ship this one, the day was a win.
- Must be demoable in under 90 seconds.

Data

Describe the 5 real (or representative synthetic) examples you’ll bring. What are they, where do they come from, and is anything sensitive that needs anonymizing first?

Risks

Does your project satisfy all the required criteria and avoid all the disqualifying ones? If not, address it here.
Biggest unknown (in one line).
Anything you need from the host (sample data, an API you don’t have access to, a specific library).