DORA 2025: Development Metrics and AI

About this publication

19.12.2025 This publication is an adapted translation of Google Cloud and DORA's report "2025 State of AI-Assisted Software Development." The original is available at the link. Reading time: 10 min. This chapter was prepared with contributions from: Sarah D’Angelo, Ph.D. - user experience researcher, Google Ambar Murillo - user experience researcher, Google AI & Infrastructure Sarah Inman, Ph.D. - user experience researcher, Google Kevin M. Storer, Ph.D. - user experience researcher, Google Cloud

Why development measurement frameworks matter

Measuring the development process helps drive real change.

But it is a difficult task: first you need to understand what is worth measuring and what can actually be measured.

The main thing is to use metrics not for the numbers themselves, but to change how the team and the whole organization work.

For this, it is better to rely on established frameworks.

A framework breaks a broad concept, such as developer experience, into specific measurable elements such as speed, quality, satisfaction, and more.

When industry or academia talks about development metrics, the frameworks most often mentioned are: SPACE DevEx HEART Software delivery metrics DORA

Choosing the right framework can be difficult, but it is a key step

Where to start? Define which decisions you want to make based on data.

Three directions of measurement frameworks

Different frameworks serve different purposes.
They are usually focused on three areas: Developer experience Product quality Organizational effectiveness
Each of these groups offers its own perspective on the evolution of the engineering system and how it should be measured.
To determine which framework best fits your organization's goals, it helps to think of frameworks as the why behind measurement.
They help explain why you are measuring and what actions should follow from the data you collect.
Frameworks set the lens through which you view the data and help ensure your efforts align with organizational goals.
To choose a framework, ask yourself: Why now?
What has changed that makes measurement necessary?
How will you use the insights?
What decisions or improvements will measurement enable?
Next comes the "what" - the metrics themselves.
These are the core concepts that feed the framework, such as velocity metrics or usage metrics.

Self-reported data - data reported by developers

There are usually two main approaches to data collection.
First - self-reported data: collecting information directly from developers about their experience. This can be done through:
Surveys are answers to questions about opinions, satisfaction, and perceptions of different aspects of work.
Interviews and focus groups are one-on-one and group discussions for deeper exploration of topics.
Diary studies are the collection of data about actions, thoughts, and experiences during work.
The main advantage of self-reported data is its ability to capture subjective experiences and things that cannot be measured automatically: satisfaction, well-being, and perceived effectiveness.
The plus is that it does not require complex instrumentation or deep observability into toolchains.
But there are also limitations: challenges with standardization, comparison across teams, and scaling.
Subjective data leads to variation in interpretation and vulnerability to bias (for example, recall bias and social desirability bias).

Logs-based measures - data collected automatically

The second approach is logs-based measures: metrics extracted from the systems and tools developers use. Examples:
Quantity is counting artifacts: the number of commits, the number of users.
Time is how long an activity takes, such as coding or review time.
Frequency is the rate of events, such as the number of deploys per month or PRs per week.
The main advantage is continuous, standardized data at scale that provides a detailed picture of activity.
Limitation - sufficient observability across the toolchain is required: integrations, data collection, infrastructure. This raises the entry barrier.
It is also important to understand that logs are not absolutely objective.
Instrumentation approaches vary, errors are possible, and interpretation always depends on context and can be biased.

How frameworks and metrics are connected

The framework defines which concepts you want to measure, but the choice of specific metrics depends on your resources and data availability. You need to understand:

Do you have observability for logs-based metrics?

Do you have a research team for self-reported data? Different organizations have different capabilities.

Frameworks are a guide, but they cannot fully describe complex behavior.

It is an approximation of the truth, but it is impossible to measure everything.

A useful analogy: metrics are ingredients.

Frameworks are recipes that use these ingredients.

The same ingredients can be combined in different ways to produce different "dishes" - that is, frameworks.

Some ingredients are unique, but in many cases you can still get a good result even if some ingredients are missing.

Frameworks differ because they are designed for different goals.

But their metrics often overlap

For example, productivity metrics (number of commits, PRs) appear in several frameworks.

An organization can use these metrics to assess the impact of a new team structure (organizational effectiveness), understand how well a developer tool works (product quality), and evaluate developer workload (developer experience). At the same time, some metrics are more specialized.

For example, developer well-being - an important element of frameworks focused on developer experience - usually is not a key metric in models of organizational effectiveness or product quality.

Using a single framework helps focus action and can be a good starting point.

But you do not need to limit yourself to that.

As goals and measurement capabilities change, using multiple frameworks makes it possible to get complementary analytical results; together they provide a more complete picture than each one alone.

The main thing is to measure what helps you and your organization stay focused on goals and be ready to act on the data you collect.

Assess where AI can deliver impact in your process

clients@kt.team Telegram @kt_team_it

Using measurement frameworks in the AI era

You may ask: does the arrival of AI in development change everything?

Are the existing frameworks still enough, or do we need new ones?

Any technology transformation creates the impression that metrics must be rebuilt from scratch.

In practice, the changes are usually much more specific.

If your goal is to understand how AI affects developer experience, it is enough to update a small subset of metrics while keeping the overall measurement structure.

There is no need to abandon the entire framework

By contrast, existing metrics provide a baseline against which shifts can be seen. For example, you can add indicators for AI suggestion adoption, model quality, or trust in the model while keeping the existing developer experience metrics - perceived productivity, review time, and so on.

As more advanced AI tools emerge, the roles themselves and the set of tasks in development will change.

Metrics will need to adapt to new user profiles and changed processes, but the goals of measuring developer experience will likely remain the same.

If the goal does not change, there is no need to change the framework either; it is enough to expand the set of metrics.

Even if goals do change, that does not mean measurement has to start over.

Many metrics fit multiple frameworks, so they can be quickly reassigned to new tasks.

Example: measuring code quality during AI adoption

Studying the impact of AI tools on the code they produce has become a new goal for companies and a difficult task, since the technology is evolving very quickly.

A common question is how AI affects code quality.

Faster development looks positive in the short term, but if quality drops, long-term speed will also start to decline.

So a company might set a goal to maintain high code quality while actively adopting AI tools.

This goal draws on elements from all types of frameworks and includes metrics that are often already collected: code quality, engagement with tool usage, and perceived speed. In this case, you can keep using your current metrics and add new ones. For example, combining DORA software delivery metrics with a product quality framework like HEART provides deep insight into how teams perceive AI tools and how those tools affect delivery.

The PDCA cycle as the foundation of a sustainable measurement system

Measuring software development processes is a complex and ongoing task.
There are many frameworks and approaches, but any of them is only useful when the organization can act on the data it gets.
A key requirement is alignment with company goals and leadership support; without that, measurement turns into reporting with no outcome.
A deliberate choice of framework and metrics helps build a sustainable measurement system.
If you follow the PDCA cycle, the work may look like this: Plan: define goals, choose the right framework, and secure leadership support. Do: establish baseline metrics and try changing part of the process. Check: measure again and assess progress toward the goals. Adjust: adapt the approach based on the new data.
We do not propose a single best framework.
The approach should fit your goals: the right one is the one that helps you make decisions and motivates the organization to act.
Although frameworks differ in structure and emphasis, many of their metrics overlap.
This means that metrics introduced today can be reused and adapted as goals or context change.

DORA AI Capabilities Model

The conclusion was written with contributions from Nathen Harvey, DORA lead at Google Cloud.
For more than a decade, DORA has remained a reliable guide for the engineering community, helping teams become stronger and more effective.
The industry is changing quickly: AI, platform engineering, and new ways of organizing development are entering the workflow.
But our mission remains the same: to study and show the practices that help create truly high-performing teams. This year we introduced the first version of the DORA AI capability model, an important step in the evolution of our research.
As companies navigate the complex process of adopting AI, this model provides a grounded, data-driven structure.
It shows seven key capabilities that amplify the positive impact of AI on important organizational outcomes. These capabilities include:
A clear and open company stance on AI use A healthy data ecosystem
Internal data available for AI work Mature version control practices Small iterative work User focus High-quality internal platforms This is the first iteration of the model.
We see it as a starting point for further dialogue with the DORA community and with companies that are introducing AI into development processes.
We would be glad to learn how you apply these findings in practice, and we will continue refining and developing the model in future research.

A user focus as a critical factor in AI success

This year's study highlights an important conclusion: we are still in the early stage of AI-assisted development, a period when technologies are changing quickly and practices are only now taking shape.
Early attempts to standardize here are premature.
Simply introducing AI tools does not lead to transformation.
Moreover, AI's impact on team performance depends directly on one critical factor: user focus. We found with high confidence that when a team is user-centered, AI's positive impact on its work increases.
When that focus is missing, introducing AI can worsen team performance.
This conclusion highlights a key condition for success: a deep understanding of end users is not a bonus, but a required foundation for effective use of AI.
If the user is not at the center of your product strategy, AI will not help and in some cases may even hurt team performance.

Practical use of the research: how to experiment

This year's report findings are complex and may seem contradictory in places.
This is a normal reflection of a rapidly changing environment.
We suggest treating the findings not as strict instructions, but as hypotheses for experiments within your team.
Here is how you can apply these ideas in practice:
Run experiments in your organization - use DORA's findings to form hypotheses and test them in your teams.
This will help you better understand your operational context and identify the areas where improvements will have the greatest impact.
Run internal surveys - use the questions from this year's study as a starting point and adapt them to your needs.
Add more nuanced questions that are relevant to your teams and current projects.
Look at the platform as a whole - improving one feature does not fix a weak platform.
Treat internal platforms as products: improve the full developer experience chain, from feedback to automation.
Share what you learn - when you run experiments and gather insights, spread that knowledge across the organization through reports, internal communities of practice, or informal conversations.
The goal is to build a culture of continuous learning.
The main risk right now is not falling behind, but widespread chaotic action with no meaningful result.
Choose approaches and frameworks that fit your organization and drive meaningful change.

Join the DORA community

Thank you for engaging with our research.
We invite you to continue the journey with us.
Share your experience, learn from peers, and find inspiration by joining the DORA community: https://dora.community.
The DORA study is an example of the strength of a global community united by a common goal.
We are grateful to the thousands of researchers, experts, practitioners, leaders, and change agents who generously share their knowledge and experience.
This report is a direct result of collective work. Thanks to our reader.
We hope the research results will help you and your teams create real change and build a culture of continuous improvement within your organizations.

DORA 2025, Part 9. How to choose the right metrics and frameworks to measure development and AI impact in IT organizations

About this publication

Why development measurement frameworks matter

Choosing the right framework can be difficult, but it is a key step

Three directions of measurement frameworks

Self-reported data - data reported by developers

Logs-based measures - data collected automatically

How frameworks and metrics are connected

But their metrics often overlap

Assess where AI can deliver impact in your process

Using measurement frameworks in the AI era

There is no need to abandon the entire framework

Example: measuring code quality during AI adoption

The PDCA cycle as the foundation of a sustainable measurement system

DORA AI Capabilities Model

A user focus as a critical factor in AI success

Practical use of the research: how to experiment

Join the DORA community

Discuss the article: DORA 2025, Part 9. How to Choose…

DORA 2025, Part 9. How to choose the right metrics and frameworks to measure development and AI impact in IT organizations

About this publication

Why development measurement frameworks matter

Choosing the right framework can be difficult, but it is a key step

Three directions of measurement frameworks

Self-reported data - data reported by developers

Logs-based measures - data collected automatically

How frameworks and metrics are connected

But their metrics often overlap

Assess where AI can deliver impact in your process

Using measurement frameworks in the AI era

There is no need to abandon the entire framework

Example: measuring code quality during AI adoption

The PDCA cycle as the foundation of a sustainable measurement system

DORA AI Capabilities Model

A user focus as a critical factor in AI success

Practical use of the research: how to experiment

Join the DORA community

Discuss the article: DORA 2025, Part 9. How to Choose…

Continue on the topic

Related solutions

Articles on the topic

Related videos

News on the topic