DevOps Maturity¶
To assess the DevOps Maturity we follow the adidas DevOps Maturity Framework. Why reinvent the wheel?
Info
The adidas DevOps Maturity Framework is © adidas AG and licensed under MIT.
For some of the capabilities we conisider "crawling" unacceptable, for others it might be OK. Those items are designated with 🙅, 🆗, and 🏎 respectively.
Development¶
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Use version control for all production artifacts | 🙅 No version control | 🆗 Source code or other assets under version control | 🏎 Source code or other assets under version control and all production artifacts versioned and stored in the corresponding artifact repository |
Automate deployment processes | 🆗 Manual deployment process | 🏎 Partially automated deployment process | 🏎 Fully automated deployment process |
Implement test automation | 🙅 Manual test script execution | 🆗 Partially automated testing (unit or regression or performance tests) | 🏎 Fully automated testing (unit and reliability (regression and performance tests) |
Implement infrastructure automation | 🙅 Manual deployment process | 🆗 Partially automated deployment process. Provisioning is done by the teams | 🏎 Fully automated deployment (infrastructure-as-code). Platform Engineering provides base images |
Support test data management | 🆗 No test data management | 🏎 Partially automated test data management (e.g. manually triggered import and export of test data) | 🏎 Fully automated test data management incl. strategy (e.g. consumer data only in PROD) |
Implement continuous delivery | 🙅 No continuous delivery | 🆗 Partially automated delivery pipeline (e.g. automated build, test process with the manual deployment) | 🏎 Fully automated pipeline (automated build, test, deployment across environments) |
Include NFR’s in Definition of Done | No NFR's used | Ad-hoc NFR checks | Standardised NFR checklist as acceptance criteria for successful releases |
Shift left on security | 🙅 No security aspects considered during development cycle | 🙅 Security aspects considered during development cycle but shifted towards release (not a priority) | 🆗 Security aspects included during development cycle from the very start |
Build for resilience | 🆗 No resilience build into system | 🏎 Design infrastructure and code for failure | 🏎 Design infrastructure and code for failure with fully automated error recovery (self-healing) |
Enable team for troubleshooting | 🙅 No control over development lifecycle (e.g. access to PROD) | 🙅 Team has full control over development lifecycle (e.g. access to PROD), but no access to logs and tools relevant for troubleshooting | Team has full control over development lifecycle (e.g. access to PROD) and full access to logs and tools for troubleshooting |
Feature handling | No feature branches for controlled releases | Feature branches are implemented for controlled releases of distinct features | Feature branching and toggles are implemented to facilitate development, roll-out and roll-back (if needed) of usable features to production |
Releases | 🆗 Releases to all users and all sites / geographies in one go | 🏎 Releases to subset of users or sites or geographies | 🏎 Gradual releases to subset of users in specific sites / geographies thereby limiting the blash raduis for potential issues |
Product & Processes¶
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Gather and implement customer feedback | 🙅 No customer (internal or external) feedback gathered in development cycles | 🆗 Customer feedback (internal or external) gathered on an ad-hoc basis | 🏎 Customer feedback (internal or external) gathered after all releases |
Work in small batches and deploy more frequently | 🆗 Big work batch size and releases on a monthly basis or longer | 🏎 Work batch size optimized for weekly releases, but deployment frequency not in sync with business requirements (e.g lead time) | 🏎 Work batch size optimized for frequent releases and deployment frequency in sync with business requirements (e.g. lead time) |
Have a lightweight change approval process | 🆗 Change approval needed from multiple parties outside the team | 🏎 Change approval needed within the team | 🏎 No change approval needed or change approval process totally automated |
Integrate application data into Big Data Platform | 🆗 No application data transferred at all | 🏎 Partial business-relevant application data transferred to Big Data Platform or provided via API | 🏎 All business-relevant application data transferred to Big Data Platform |
SRE role and activities | 🆗 No clear SRE role and responsibility from Product team perspective | 🏎 SRE tasks are defined and agreed from Execution (Operations, Automation, Hotfix) perspective | 🏎 SRE tasks are defined for Execution and Governance areas and agreed with all stakeholders (Business, Development) |
Postmortems | 🙅 No causal analysis done for all outages | 🆗 All outage RCA conducted and tied to change / release | 🏎 Blameless Postmortems are conducted for all outages |
Resiliency / Chaos Engineering | 🆗 No resiliency tests are conducted | 🏎 Define environment dependencies (failure points) and execute resiliency tests to ensure no customer impact | 🏎 Regular chaos (resiliency) exercise scheduled basis stead state / functionality change |
Management & Monitoring¶
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Monitor application and infrastructure performance | 🙅 No monitoring in place | 🆗 Application or infrastructure performance monitored but no alerting in place | 🏎 Application and infrastructure performance is monitored; alerting in place for relevant KPI's |
Monitor software delivery performance | 🙅 No metrics monitored | Selected metrics monitored | All key metrics monitored |
Limit Work in Progress | More than 10 features in progress | Less than 10 features in progress | Not more than 5 features in progress |
Release governance | 🙅 Product changes rolled out to production are not regulated for stability and reliability | 🆗 Production changes are regulated basis stability and reliability benchmarks in test environments | Error Budget consumption regulates future releases to a product and act as gate to production changes |
Resilience Monitoring | 🙅 No KPI's defined for MTTx as per ITIL guidelines | 🆗 Infra and Monitoring KPI's are defined as per ITIL guidelines for MTTx, availability, throughput, reported and deviations tracked to closure | 🏎 Key monitoring signals form SLI, SLO (latency, throughput, error rate, saturation) are captured, reported and tied to product flow from business perspective |
Culture¶
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Build it and run it | 🙅 Product teams build the system, operations run (and fix) it. No end to end ownership for product lifecycle. Dev and Ops staffed in separated teams | 🆗 Full ownership for product teams to build and run the system supported by SRE. No L2 support needed | 🏎 Full ownership for product teams to build and run the system. T-shape engineering profiles within the product teams to operate in full DevOps mode with enabled SRE in the product teams |
Foster and enable team experimentation linked to business value | 🙅 No time or resources dedicated for teams experimentations | 🆗 Irregular time slots or events blocked for team experimentations (e.g. team hackathon) | 🏎 Regular time slots or events blocked for team experimentations (e.g. team hackathon every month or quarter) |
Support and facilitate collaboration among teams | 🙅 No collaboration with other teams although necessary for the product | 🆗 Irregular exchange between team members and or other teams (e.g. CoP, meetings, lunch, coffee, sports) | 🆗 Regular exchange between among team members and other teams (e.g. CoP, meetings, lunch, coffee, sports) |
Collaboration | 🙅 No collaboration with Operations around product design from stability, reliability perspective | 🆗 Product teams take design inputs (feedback) around stability, reliability from SRE experts. SRE experts are involved during testing phase (in development cycle) or post issues in production | 🏎 Product architects collaborate regularly (from planning) with SRE experts to evolve the design of the product from performance, stability, reliability |
Architecture¶
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Use a loosely coupled architecture | 🆗 Monolithic application with a high level of interdependencies | 🆗 Re-architecture in progress moving from a monolithic solution to a microservice-based architecture | 🏎 System has no or very few direct dependencies to other systems. And those dependencies are tied to open standards and not tied to technologies and frameworks (e.g. Java RPC) |
Focus on independent deployability and testability | 🙅 Dependent deployability and testability across teams | 🆗 Some components can be deployed and tested independently but parts of the components still have dependencies across teams | 🏎 Teams can deploy and test their systems independently |
Use established Platform Engineering solutions as a default | 🙅 Custom solutions used even though provided by Platform Engineering | 🆗 All solution aligned with Platform Engineering, Solution and Domain Architecture, but exceptions were granted | 🏎 All solutions aligned with Platform Engineering, Solution and Domain Architecture and no custom solutions used that are provided by Platform Engineering |