All Posts

Measuring eNPS on your engineering team (and using AI to actually make sense of it)


By the time someone hands in their resignation, you already missed the moment. The frustration was there three sprints ago. Maybe six. You could feel something was off but you couldn't name it, couldn't quantify it, and without numbers you couldn't justify doing anything about it. That's the trap most engineering managers fall into: waiting for a signal clear enough to act on, and by then it's too late.

eNPS — Employee Net Promoter Score — doesn't solve this problem completely, but it does give you a repeatable, low-overhead signal that keeps you honest. And when you pair it with an LLM doing qualitative analysis on the open-ended responses, it goes from "we have a vague sense of morale" to "here's exactly what's bothering the team and here's how it compared to last quarter."

I've run this across a few team configurations now. Here's what I actually learned, not just what the framework says.

What eNPS is (and isn't)

The concept borrows from customer NPS. One core question:

"On a scale of 0–10, how likely are you to recommend this team as a great place to work?"

Responses fall into three groups: Promoters (9–10), Passives (7–8), and Detractors (0–6). The score is:

eNPS = % Promoters − % Detractors

It ranges from −100 to +100. Above 0 is acceptable. Above +20 is genuinely healthy. Above +50 is rare in engineering.

Here's the part most guides skip: the number itself is almost meaningless in isolation. I've seen teams with a +35 that were quietly miserable in ways the score didn't capture, and teams with a +10 that were actually in a good place given the chaos they were navigating. What matters is the trajectory across quarters, and what the open-ended responses say about why.

eNPS is not a diagnostic tool. It's an early warning system. It tells you something shifted. The actual diagnosis comes from everything else.

Why it works for engineering specifically

Generic engagement surveys miss what engineers actually care about. A 60-question annual survey asking about cafeteria quality and office amenities tells you nothing about whether your team trusts the technical direction, whether they have enough uninterrupted time to think, or whether the deployment process is so painful that shipping feels like punishment.

Engineers are also unusually resistant to performative feedback tools. If they don't trust that responses are truly anonymous, they'll all score 7 and write nothing in the open-ended fields. Your data becomes noise. The survey design has to earn that trust before it can generate any useful signal.

The other thing eNPS has going for it: it takes two minutes. High response rates from engineers are not a given. Short surveys get answered; long ones get abandoned or half-heartedly completed.

How to run it

Cadence: quarterly. Monthly creates fatigue and the scores become too noisy to trend. Annually is too slow to catch anything while you can still act on it.

Questions: three, maximum.

  1. "On a scale of 0–10, how likely are you to recommend this team as a great place to work?"
  2. "What's the primary reason for your score?"
  3. "What's one thing we could change that would make the biggest difference for you?"

Anonymity: non-negotiable, and engineers will know if you're not serious about it. Use Google Forms with no email collection enabled. Don't restrict responses by Google account unless you have a strong reason to. Don't use a tool that ties response timestamps to Slack activity or calendar data. Engineers will notice and they will talk.

Send the survey at the start of a sprint, keep it open for five days, send one reminder on day three with the current response count (not individual results), then close it. Calculate the score. Export the open-ended responses. That's your raw material.

Interpreting the score

A rough guide:

Score range What it probably means
+50 to +100 Something is working well. Figure out what and protect it.
+20 to +49 Healthy baseline. Keep iterating.
0 to +19 Worth digging into. Not a crisis but not comfortable either.
Below 0 Structural problems. Act fast.

A 10-point drop after a reorg or a brutal quarter is normal. I've lived through that. A sustained decline over two or three consecutive quarters is the signal you can't ignore.

When the score drops, the instinct is to figure out what you did wrong. That's sometimes the right question. But sometimes the team is going through something difficult and the score reflects reality accurately — a painful migration, a product direction that isn't landing, organizational uncertainty above your level. Understanding the context matters as much as the number itself.

Where AI actually earns its place here

The quantitative score is table stakes. The real value is in the open-ended responses, and that's also where most managers quietly give up. Reading 40 free-text responses and trying to synthesize them into something coherent, while keeping your own biases out of the interpretation, is harder than it sounds.

This is the problem LLMs are genuinely good at.

Theme clustering

Paste your open-ended responses into an LLM with this prompt:

"Below are anonymous survey responses from an engineering team. Group them into 3–5 themes. For each theme, provide a label, the number of responses related to it, and one representative quote that captures the theme. Do not identify or speculate about individual people."

The output is faster and more consistent than manual synthesis, and it surfaces themes you might unconsciously deprioritize because they're uncomfortable. I've had the model surface a career growth concern that I had mentally filed under "individual ambition" — and when I looked at how many responses touched on it, it was clearly a team-wide pattern, not a personal thing.

Sentiment tracking over time

Run the same prompt every quarter and log the results in a simple table:

Quarter Theme Sentiment Mentions
Q1 Technical debt Negative 14
Q2 Technical debt Mixed 9
Q3 Technical debt Mostly positive 4

If you committed sprint capacity to reducing technical debt in Q2, and by Q3 the mention count dropped and the sentiment shifted, that's real feedback that the change landed. Teams rarely tell you directly that something you did worked. This is one way to see it.

Early warning detection

This one is the most sensitive to get right. The prompt is:

"From the following survey responses, identify any that suggest the respondent may be considering leaving or is highly disengaged. Do not identify individuals. Summarize the systemic concerns that appear across multiple responses."

The important framing there is systemic. You're not trying to identify who wrote what — you're trying to spot whether five different people independently described the same structural problem. That's a different thing from surveillance, and how you talk about this internally matters. If your team thinks you're using eNPS to identify the disgruntled people, you've already lost.

Quarter-over-quarter comparison

Once you have data from two quarters, run this:

"Compare the themes from Q1 and Q2 survey responses below. What got better? What got worse? What's new this quarter that wasn't there before?"

This gives you a narrative. "Last quarter, meeting load was the loudest concern. We cut three recurring syncs. This quarter it's not in the top themes." That's a story you can tell the team — and closing that loop is the part that actually makes people keep participating.

A workflow that doesn't take over your life

Here's the full sequence, realistically:

Quarter starts. Send the survey on Monday of sprint one. Keep it open until Friday. Send a reminder Wednesday. Close it, calculate the score, export the responses as a text file. Run the AI analysis (ten minutes, maybe fifteen if you want to iterate on the prompts). Log the score, themes, sentiment, and three to five action items. Share the results with the team that week — score, what you heard, and one concrete thing you're committing to. Do the thing before the next survey goes out.

The last step is the one that determines whether this works. If the team sees feedback going into a void, participation drops. Not dramatically at first — people give it one more cycle — but it hollows out. The survey becomes a ritual without a point.

The AI agent skill — what it actually does and how to use it

I built a reusable skill file for MCP-compatible AI agents that automates this entire workflow. If you've used Claude or any agent that supports the Model Context Protocol, you can drop these files in and run the full eNPS cycle from a single prompt.

The skill is two files: SKILL.md (the main workflow) and reference/prompts.md (the AI analysis prompt templates with worked examples). Here's what's in each.

SKILL.md — the full workflow

The skill file opens with a metadata block that tells the agent when and how to activate:

---
name: team-health
description: Run quarterly eNPS surveys to measure engineering team health.
  Creates anonymous surveys via Google Forms, distributes via Slack, collects
  responses, and uses AI to analyze themes and track sentiment over time.
triggers:
  - "run eNPS survey"
  - "team health check"
  - "quarterly survey"
  - "measure team morale"
integrations:
  - slack-mcp
  - google-workspace-mcp
---

The trigger phrases matter. A well-defined skill file tells the agent not just what to do, but when to activate without you having to explain the context every time.

The workflow then breaks into seven steps, each with explicit success criteria so the agent knows when a step is genuinely complete before moving on:

Step 1: configuration check

Before creating anything, the agent asks for four variables:

Variable Purpose Default
slack_channel Where to post announcements #engineering
survey_title Name for this cycle (e.g., "Q1 2026 Team Health") Derived from current quarter
engineer_emails Optional list to restrict form access Empty (open access)
collection_window_days How long the survey stays open 5

The agent won't proceed without slack_channel and survey_title. The others have sensible defaults. This matters because a poorly configured survey — wrong channel, wrong anonymity settings — produces bad data that takes a quarter to detect and correct.

Step 2: create the Google Form

The skill specifies exact form configuration, not just "create a form":

Form title: {survey_title}
Collect email addresses: OFF
Limit to 1 response: ON (if engineer_emails is set, else OFF)
Require sign-in: OFF

Question 1:
  Type: Linear scale
  Label: "On a scale of 0–10, how likely are you to recommend this team as a great place to work?"
  Scale: 0 to 10
  Low label: "Not at all likely"
  High label: "Extremely likely"

Question 2:
  Type: Paragraph text
  Label: "What's the main reason for your score?"
  Required: false

Question 3:
  Type: Paragraph text
  Label: "What's one thing that would make the biggest difference for you right now?"
  Required: false

The "Required: false" on the open-ended questions is intentional. Forcing text input when someone doesn't have anything to say produces garbage. Better to get a clean 0–10 score with no commentary than a score plus a placeholder response.

Step 3: launch announcement to Slack

The skill includes the exact Slack message template, not a generic "post to Slack" instruction:

:bar_chart: *Q1 2026 Team Health Check*

Quick survey — takes about 2 minutes.

This is fully anonymous. I don't see individual responses, only aggregated themes. No email collection.

Link: {form_url}

Open until {close_date}. I'll share results and what I'm doing about them the week after.

The transparency line ("I don't see individual responses") is not optional. It directly affects response quality and honesty.

Step 4: three-day reminder

The reminder includes the current response count, which creates low-key social proof without pressure:

:reminder_ribbon: Reminder: Team Health Survey is still open ({response_count} responses so far)

Closes {close_date}. Two minutes: {form_url}

Don't send a second reminder after this. Two Slack messages is the limit before it starts feeling like pressure.

Step 5: close and collect

The agent closes the form at end of business on day five, exports responses in two formats: a CSV with the raw 0–10 scores for eNPS calculation, and a plain text file with the open-ended responses formatted for LLM input.

The text file format matters for prompt reliability:

=== Q2: Primary reason for score ===
[Response 1] I feel like we ship fast but never slow down to fix things properly.
[Response 2] My manager actually listens. That's rarer than it should be.
[Response 3] No clear path to senior. I've been here 18 months and don't know what I'm being evaluated on.
...

=== Q3: One thing that would make the biggest difference ===
[Response 1] 20% time that's actually protected, not just promised.
...

The [Response N] prefix keeps responses visually separated without leaking any identifying metadata.

Step 6: calculate eNPS

The agent runs this calculation and logs the components, not just the final score:

Total responses: N
Promoters (9–10): P  →  P/N × 100 = P%
Passives (7–8):   A  →  A/N × 100 = A%
Detractors (0–6): D  →  D/N × 100 = D%

eNPS = P% − D% = {score}

Logging the breakdown matters. A +20 from 60% Promoters / 40% Passives / 0% Detractors is a very different situation from a +20 from 40% Promoters / 40% Passives / 20% Detractors. The score is the same. The underlying health is not.

Step 7: AI analysis

The agent runs all four prompt templates sequentially against the exported responses and assembles the output into a structured report (see the prompt details in the next section).

Step 8: results post to Slack

The results message template is designed to close the loop, not just report numbers:

:bar_chart: *Q1 2026 Team Health Results*

*eNPS: {score}* ({interpretation})
{total_responses} responses · {response_rate}% of the team

*What I heard most:*
• {theme_1}: {theme_1_summary}
• {theme_2}: {theme_2_summary}
• {theme_3}: {theme_3_summary}

*What I'm doing about it:*
{action_item_1}
{action_item_2}

Full analysis posted to {confluence_link or notion_link}.

Next survey: {next_quarter_date}

The "What I'm doing about it" section is the point. It's the reason people respond next time.

reference/prompts.md — the four AI templates

This file is the most portable part of the skill. Even if you never use an AI agent, these prompts work standalone in any LLM.

Prompt 1: theme clustering

You will receive anonymous survey responses from an engineering team.
Your task is to group them into 3–5 themes.

For each theme:
- Give it a short, specific label (not "Communication" — something like "Unclear priorities at the team level")
- Count how many responses relate to it
- Select one representative quote that captures the theme without identifying anyone
- Note whether the theme is about something structural (process, tooling, org design) or interpersonal (manager relationship, team dynamics)

Do not speculate about individuals. Do not infer who wrote what based on writing style or context.

Responses:
{paste responses here}

The instruction to distinguish structural from interpersonal is something I added after a few cycles. It changes how you prioritize the actions. Structural problems need system changes. Interpersonal ones often need different conversations.

Prompt 2: sentiment scoring

For each of the following themes identified in the survey analysis, rate the overall sentiment expressed across the related responses.

Use one of four ratings:
- Positive: team members feel good about this area
- Mixed: roughly split between positive and negative
- Negative: team members are frustrated or disengaged here
- Neutral: factual observations without strong emotional signal

Also provide a one-sentence characterization of the specific flavor of the sentiment (e.g., not just "Negative" but "Frustrated with slow progress rather than opposed to the goal itself").

Themes: {paste themes here}
Responses: {paste responses here}

The one-sentence flavor note is the useful part. "Negative" tells you something is wrong. "The team is frustrated that the reorg happened, not that the new structure is bad" tells you whether to address the change or the change management.

Prompt 3: early warning detection

Review the following survey responses for signals of attrition risk or significant disengagement.

Categorize signals as:
- High risk: language suggesting the person is actively considering leaving or has mentally checked out
- Medium risk: language suggesting disengagement, unfulfilled expectations, or feeling stuck
- Systemic: patterns appearing across multiple responses that suggest a team-wide issue rather than individual circumstances

For each category, summarize the concern in general terms. Do not quote directly in ways that could identify individuals. Do not attempt to guess who wrote what.

If no clear signals are present, say so plainly. Do not manufacture risk signals that aren't in the data.

Responses:
{paste responses here}

The last instruction ("do not manufacture risk signals") matters. LLMs have a tendency to find patterns when prompted to look for them, even in clean data. Explicitly telling it to say "nothing significant here" when that's true reduces false positives.

Prompt 4: quarter-over-quarter comparison

Below are theme summaries from two consecutive quarterly eNPS surveys for the same engineering team.

Compare them:
1. What improved between Q{N-1} and Q{N}? Be specific about what changed.
2. What got worse or didn't improve despite being flagged last quarter?
3. What is new this quarter that wasn't present before?
4. Is there anything in Q{N} that contradicts what Q{N-1} suggested?

Be direct. If something got worse, say it got worse. If an action the manager took seems to have had no effect, note that.

Q{N-1} themes:
{paste previous quarter themes}

Q{N} themes:
{paste current quarter themes}

The instruction to note when an action had no apparent effect is important and uncomfortable. It's also the most useful feedback a manager can get. If you dedicated sprint capacity to tech debt last quarter and this quarter's responses still mention it as the top frustration, something is off — either the effort wasn't visible, wasn't enough, or wasn't addressing the actual problem.

how to install it

For Claude or any MCP-compatible agent, place the files at:

.agents/skills/team-health/SKILL.md
.agents/skills/team-health/reference/prompts.md

The Slack MCP and Google Workspace MCP need to be connected for the automated steps to work. If they're not available, the skill degrades gracefully — the agent walks you through each step manually with the exact configuration to use.

The prompts in reference/prompts.md work standalone regardless. Copy them, paste your responses, run them in whatever LLM you use. That's the minimum viable version and it's where I'd start before connecting any integrations.

The thing that actually determines whether this works

I've seen eNPS programs that ran for a year and produced nothing. Not because the questions were wrong or the cadence was off, but because the results went into a document that nobody referenced until the next survey went out.

The feedback loop only works if people believe it works. That means two things: the survey has to be genuinely anonymous (not just said to be), and the actions have to be visible before the next survey lands. One concrete change, clearly connected to what the team said, is worth more than five action items that quietly drop.

If you've run eNPS on your team, or you've tried a different approach to measuring morale, I'd actually like to hear what you found. What signals were you watching that the survey missed? Reach out on LinkedIn.

Share this post

Back to all posts