How to Measure AI Adoption - And Why Your Dashboard is Lying to You

rmclements10
Mar 30
12 min read

Updated: Mar 30

When I was 8 I built a slide deck on why my sister and I should be allowed to get a puppy.

I was prepared for my audience - my dad was a BIG believer of data.

So this is how I positioned it:

this puppy would teach us responsibility and we would take care of it every day
we would walk the puppy every day and that is good exercise
we've already been taking care of the neighbors cats while they are on vacation and we did a good job

I nailed the pitch.

And we ended up with the one on the far right (see below :) )

As much as I love data,

I LOVE a dashboard.

I want to see green and full circles.

I don't care if it's my checking account expenditures, my AppleWatch, or the dashboard I built in Notion to monitor my habit tracking.

And in corporate America - we love data.

We love dashboards and green and high percentages.

But do you want green and high percentages or do you want data that actually answers a question?

Because you can present to leadership an employee engagement report that says 97% of employees love their job and love coming to work every day. If..... you only include the people who are top performers and have been with the company 5+ years and just got a salary bump.

But if you actually want an accurate representation of how engaged employees feel at your organization?

You might have to dig a little deeper. You might have to include contractors. You might have to included people on a PIP.

Okay this won't make sense unless you read the other blogs first. So do that and then come back here.

I built a dashboard once that told me 74% of our target population had adopted the AI tools we'd rolled out. Green across the board. Clean numbers. Exactly what I needed to take into the quarterly business review and demonstrate that the program was working.

It wasn't working.

I found out the hard way - not from the data. But because a colleague granted me access to a private slack channel for front line users of the tool discussing it's.... challange. Comprised of people who'd been on every training call, completed every module, and logged in to the platform religiously. One person told me she only used it because her manager baked it into her KPIs.

The 74% was real. The adoption was a fiction. And I had built a measurement system that was entirely incapable of telling me the difference between the two.

That experience cost us time, credibility, and probably six months of actual behavior change we could have been building if we'd known sooner. And when I look at the measurement architectures I see in most enterprise AI programs today, I see the same problem running at scale - numbers that look like adoption, presented to leadership as adoption, believed as adoption, while the actual behavior change the entire program is supposed to produce sits untouched underneath.

This post is about how that happens, why it's so persistent, and what to measure instead.

The Measurement Problem Is Structural, Not Accidental

Here is the honest explanation for why most AI adoption dashboards lie: they're built to measure what's easy to count, not what actually matters.

Logins are easy to count. Training completions are easy to count. License utilization is easy to count. Session frequency, prompt volume, tool access rates - all of it is easy to count, and all of it is available in the vendor's admin dashboard with zero additional work from your team.

Behavior change is not easy to count. Whether someone has fundamentally altered how they approach a task, how they exercise judgment, how they think about their work, none of that shows up in a login report. And building the infrastructure to measure it requires decisions, investment, and organizational agreement on what "changed behavior" even looks like for each specific role before the program launches.

Most organizations don't make those decisions. Not because they're negligent — because they're moving fast, they have pressure to demonstrate progress, and the vendor is right there with a dashboard full of green numbers. So they measure what's available, call it adoption, and find out eighteen months later that what they actually built was very expensive compliance theater.

The most damaging mistake any organization can make in AI measurement is conflating adoption with impact. High usage doesn't mean high value. A platform can report a 92% monthly adoption rate - and those numbers don't tell you whether a single person is doing their job differently because of it.

In fact,

if you are measuring other things - like the amount of time it takes someone to do their job or the trust they have in the data they are outputting - you might find that, sure, they are utilizing AI but their confidence in their work has decreased and the amount of time it takes them to do their job, use the AI tool, then double check the AI's results may have tripled.

The Three Metrics That Feel Like Adoption But Aren't

Let me be specific, because the problem isn't that organizations are measuring nothing. It's that they're measuring three categories of things that create the convincing appearance of adoption while telling you almost nothing about whether behavior has actually changed.

Let me also be honest, these are the metrics I first used to prove effecacy because I didn't understand the ideal outcomes that the tool was designed for and these were the easiest to share with leadership that made me look good.

Login and access rates. Someone opened the tool. That's all a login tells you. You could use a tool every day to create completely irrelevant outputs and it would technically count as active usage. Login rates are important for establishing baseline access, but they have nothing to say about whether the access is producing any organizational value.

Training completion. Someone sat through the module and clicked through to the certificate. In the research we covered in Post 3, McKinsey found that seven in ten employees skip the formal onboarding content entirely, relying instead on trial and error and peer learning. So training completion doesn't even reliably tell you whether someone got the knowledge the training was designed to deliver - let alone whether they applied it to their work.

Prompt volume and frequency. This one is the most seductive because it feels like depth. Someone sending fifty prompts a week must be using the tool more meaningfully than someone sending five, right? Not necessarily. There is a six-times productivity gap between AI power users and average employees. High adoption with low proficiency means your organization is generating activity, not results.

Prompt frequency without proficiency measurement tells you people are interacting with the tool. It doesn't tell you whether those interactions are producing better work.

The common thread across all three is that they measure inputs - access, completion, interaction - rather than outputs. What changed? What's different about the work product, the workflow, the decision-making, the time allocation?

What Behavior Change Actually Looks Like - And How You Know When It's Happening

Before you can measure behavior change, you have to define what it looks like for each specific role in your organization. This sounds obvious. Almost nobody does it.

Most AI programs define success at the tool level — "employees are using Copilot to draft communications" or "the team is using the AI assistant for research synthesis." Those are activity descriptions, not behavior change definitions.

The behavior change question is different: what is a person doing differently as a result of using this tool, and how does that change show up in the work itself?

For a financial analyst, behavior change might look like: the ratio of time spent gathering and formatting data versus analyzing and drawing conclusions has shifted.

For a customer service representative, it might look like: first-contact resolution rates have improved because they're accessing relevant information faster.

For a manager, it might look like: meeting prep time has decreased and the quality of pre-read documents has measurably improved.

McKinsey's research identified AI high performers as 2.8 times more likely to report fundamental workflow redesign compared to organizations stuck in pilot mode. That is the single cleanest signal in the dataset separating genuine transformation from expensive experimentation. Workflow redesign is the behavior change. Everything else is a precondition for it.

Here's the test I now apply to any AI adoption metric before it goes into a report to leadership: does this number tell me whether the way people work has changed, or does it tell me whether people are using the tool? If it only answers the second question, it's not an adoption metric. It's an activity metric.

You are going to have to get creative about how to measure it and how to show progress and how to pitch it to executives in a way that they can prove a return on investment.

The Baseline Problem That Most Programs Never Solve

Here is the measurement failure that I see most consistently, and that is almost impossible to fix retroactively: organizations don't establish a baseline before they launch.

You cannot show progress without a before state.

You cannot demonstrate that AI has changed how long a task takes if you don't know how long it took before AI was introduced.

You cannot demonstrate that output quality has improved if you didn't measure output quality at the start. You cannot show that employee confidence with a specific capability has increased if you didn't measure confidence at baseline.

This isn't complicated measurement design. It's basic research methodology. But in the pressure to launch, deploy, and demonstrate early wins, the baseline assessment almost always gets deprioritized until it's too late. And then organizations find themselves twelve months in, with lots of data about tool usage and no data about whether any of it mattered - because they have nothing to compare it to.

McKinsey's research consistently finds that measurement is immature across the enterprise AI space - many organizations still lack robust leading indicators for AI initiatives, and where tracking does exist, value realization rises and risk incidents fall.

The causation runs in both directions: measurement creates accountability, accountability creates better outcomes, and better outcomes generate the evidence that sustains program investment.

The baseline isn't optional. It's the foundation that makes every subsequent measurement meaningful. And it has to be built before anything launches.

The Sentiment Gap That No Dashboard Will Show You

In marketing we have this thing called sentiment.

The goal is to understand how people feel about a company or product.

When Trimble (Fortune 500 SaaS) acquired several new global companies we underwent a brand revamp. We wanted to be known as 1 brand and we wanted to be known for our impact in sustainability.

This is the data we got a baseline of and then tracked for 12 months:

Social Listening Score - After you launch a sustainability campaign (say, "our work helps your company save money to build and operate"), you track every mention of your brand + sustainability keywords for 30 days accross social media platforms and the internet. Result: 35% positive sentiment — up from 20% before the campaign. That data tells you the message landed.
NPS Survey with a Sustainability Question - Add one question to your post-purchase survey: "How important is our commitment to sustainability in your decision to choose us?" If 60% say "very important," you know it's a real purchase driver - not just nice PR.

How can we use this to measure behavior change and the sentiment towards AI?

How do employees actually feel about AI - not what they say in an official survey that gets seen by their manager, but what they believe about whether AI is helping them, whether using it is career-enhancing or career-risky, whether they feel equipped to use it well, and whether they trust that leadership's stated intentions about AI and workforce impact are genuine?

Columbia Business School research found that 76% of executives believe their employees feel enthusiastic about AI adoption. Only 31% of employees actually express that enthusiasm.

That perception gap is a direct measurement of how well your program is actually building the conditions for sustained behavior change. When the gap is that large, it means one of two things: leadership doesn't know what employees actually think, or leadership knows and isn't acting on it. Either way, the behavior change the program depends on is happening in a hostile environment that no amount of training or communications will fully overcome.

The organizations building real measurement infrastructure are adding a continuous sentiment dimension - not a quarterly engagement survey, but a rolling pulse on specific AI-related beliefs that are diagnostic for adoption.

Things like:

Do I feel equipped to use AI tools effectively in my actual work?
Do I believe using AI will help my career here?
Do I trust that my feedback about what's working and what isn't is actually reaching decision-makers?

Those questions don't live in a vendor dashboard. Building them requires intentional design and organizational commitment to act on what they surface. But they are among the most predictive leading indicators of whether behavior change is actually happening at scale - or whether what looks like adoption in the activity data is actually the compliance theater I described at the beginning.

What a Real Measurement Framework Looks Like

Based on what I've built, what I've seen work, and what the research consistently points to, here is the architecture that actually tells you whether your program is producing the behavioral change it's designed to produce.

Layer one:
Baseline first, always. Before any tools deploy at scale, establish role-specific benchmarks for the behaviors the program is designed to change. How long does this task currently take? What does the quality of this output currently look like? How confident does this population feel with this capability?
Layer two:
Behavioral indicators, not activity indicators. Define, for each major role, what changed behavior looks like - specifically, observably, in the work itself. Then build or designate measurement mechanisms for those specific behaviors. Some of this will be quantitative (time on task, output quality scoring, first-contact resolution rates). Some will require manager observation. Some will require structured periodic assessment.
Layer three:
Proficiency, not just usage. There is a six-times productivity gap between AI power users and average employees - high adoption with low proficiency generates activity, not results. Measuring proficiency requires going beyond whether someone uses the tool to whether they're using it in ways that produce meaningfully better outcomes. This is harder to measure. It requires role-specific assessment, periodic skill evaluation, and probably a different conversation with managers than most programs have equipped them for. But without it, your adoption data is telling you a story about inputs, not outputs.
Layer four:
Continuous sentiment. A rolling pulse on the specific beliefs that predict sustained behavior change. Not general engagement - specific AI adoption sentiment, segmented by role, function, and tenure so you can see where the gaps are widest and deploy targeted interventions before they become adoption ceilings.
Layer five:
Business outcome correlation. The hardest and most important layer. At some point, leadership will ask whether the AI program is producing the organizational outcomes it was designed to produce. The organizations that can answer that question credibly are the ones that built outcome metrics into the program design from the start - and that connected specific AI behaviors to specific business indicators before they launched, not after they'd been running for a year and needed to justify continued investment.

The Leadership Conversation This Requires

Here is the part that most measurement conversations skip, because it's uncomfortable.

Building an honest measurement architecture for an AI adoption program requires leadership to agree, before the program launches, on what success actually looks like - specifically enough that it can be measured, and honestly enough that the metrics can't be gamed by counting the wrong things.

That conversation is harder than it sounds. Because what leadership usually wants is something that shows the program is working. What a real measurement architecture provides is something that shows whether the program is working - and those two things are not always the same answer.

Deloitte research found that AI champions raise tool usage by 65% and strategic communications improve trust metrics by 16%. SHRM Those numbers matter. But they only matter if you know what usage looked like before the champions engaged, and if you have a way to connect trust metrics to the behavioral outcomes you're ultimately trying to produce.

The organizations that are succeeding at this aren't the ones with the most sophisticated dashboards. They're the ones where leadership agreed - early, specifically, and with enough honesty to hold the measurement accountable - on what the program is actually supposed to change. And then built the infrastructure to find out whether it did.

That agreement is a leadership decision. The measurement architecture that follows it is what your internal communications and change function should own.

What You Should Do Before Your Next Leadership Update

If you're responsible for AI adoption measurement in your organization right now, here are the four questions worth asking before the next executive presentation:

Do we have baseline data for the behaviors we're trying to change — or are we measuring change without knowing what the starting point was?
Are our primary metrics measuring whether people are using the tool, or whether the way people work has changed?
Do we have a continuous read on employee sentiment specifically about AI — not general engagement, but the beliefs that predict whether behavior change will stick?
Can we draw a credible line between our adoption metrics and a specific business outcome — or are we presenting activity data as evidence of transformation?

If the answer to any of those is no, the dashboard is telling you a story. And at some point - usually around the time the board asks for ROI evidence - that story is going to become very expensive to explain.

Post 5 of 6 → The Real Role of an Internal Communications Leader in an AI Transformation — And Why the Version Most Companies Are Using Is About Ten Years Out of Date