DevOps MaturityΒΆ
To assess the DevOps Maturity we follow the adidas DevOps Maturity Framework. Why reinvent the wheel?
Info
The adidas DevOps Maturity Framework is Β© adidas AG and licensed under MIT.
For some of the capabilities we conisider "crawling" unacceptable, for others it might be OK. Those items are designated with π , π, and π respectively.
DevelopmentΒΆ
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Use version control for all production artifacts | π No version control | π Source code or other assets under version control | π Source code or other assets under version control and all production artifacts versioned and stored in the corresponding artifact repository |
Automate deployment processes | π Manual deployment process | π Partially automated deployment process | π Fully automated deployment process |
Implement test automation | π Manual test script execution | π Partially automated testing (unit or regression or performance tests) | π Fully automated testing (unit and reliability (regression and performance tests) |
Implement infrastructure automation | π Manual deployment process | π Partially automated deployment process. Provisioning is done by the teams | π Fully automated deployment (infrastructure-as-code). Platform Engineering provides base images |
Support test data management | π No test data management | π Partially automated test data management (e.g. manually triggered import and export of test data) | π Fully automated test data management incl. strategy (e.g. consumer data only in PROD) |
Implement continuous delivery | π No continuous delivery | π Partially automated delivery pipeline (e.g. automated build, test process with the manual deployment) | π Fully automated pipeline (automated build, test, deployment across environments) |
Include NFRβs in Definition of Done | No NFR's used | Ad-hoc NFR checks | Standardised NFR checklist as acceptance criteria for successful releases |
Shift left on security | π No security aspects considered during development cycle | π Security aspects considered during development cycle but shifted towards release (not a priority) | π Security aspects included during development cycle from the very start |
Build for resilience | π No resilience build into system | π Design infrastructure and code for failure | π Design infrastructure and code for failure with fully automated error recovery (self-healing) |
Enable team for troubleshooting | π No control over development lifecycle (e.g. access to PROD) | π Team has full control over development lifecycle (e.g. access to PROD), but no access to logs and tools relevant for troubleshooting | Team has full control over development lifecycle (e.g. access to PROD) and full access to logs and tools for troubleshooting |
Feature handling | No feature branches for controlled releases | Feature branches are implemented for controlled releases of distinct features | Feature branching and toggles are implemented to facilitate development, roll-out and roll-back (if needed) of usable features to production |
Releases | π Releases to all users and all sites / geographies in one go | π Releases to subset of users or sites or geographies | π Gradual releases to subset of users in specific sites / geographies thereby limiting the blash raduis for potential issues |
Product & ProcessesΒΆ
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Gather and implement customer feedback | π No customer (internal or external) feedback gathered in development cycles | π Customer feedback (internal or external) gathered on an ad-hoc basis | π Customer feedback (internal or external) gathered after all releases |
Work in small batches and deploy more frequently | π Big work batch size and releases on a monthly basis or longer | π Work batch size optimized for weekly releases, but deployment frequency not in sync with business requirements (e.g lead time) | π Work batch size optimized for frequent releases and deployment frequency in sync with business requirements (e.g. lead time) |
Have a lightweight change approval process | π Change approval needed from multiple parties outside the team | π Change approval needed within the team | π No change approval needed or change approval process totally automated |
Integrate application data into Big Data Platform | π No application data transferred at all | π Partial business-relevant application data transferred to Big Data Platform or provided via API | π All business-relevant application data transferred to Big Data Platform |
SRE role and activities | π No clear SRE role and responsibility from Product team perspective | π SRE tasks are defined and agreed from Execution (Operations, Automation, Hotfix) perspective | π SRE tasks are defined for Execution and Governance areas and agreed with all stakeholders (Business, Development) |
Postmortems | π No causal analysis done for all outages | π All outage RCA conducted and tied to change / release | π Blameless Postmortems are conducted for all outages |
Resiliency / Chaos Engineering | π No resiliency tests are conducted | π Define environment dependencies (failure points) and execute resiliency tests to ensure no customer impact | π Regular chaos (resiliency) exercise scheduled basis stead state / functionality change |
Management & MonitoringΒΆ
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Monitor application and infrastructure performance | π No monitoring in place | π Application or infrastructure performance monitored but no alerting in place | π Application and infrastructure performance is monitored; alerting in place for relevant KPI's |
Monitor software delivery performance | π No metrics monitored | Selected metrics monitored | All key metrics monitored |
Limit Work in Progress | More than 10 features in progress | Less than 10 features in progress | Not more than 5 features in progress |
Release governance | π Product changes rolled out to production are not regulated for stability and reliability | π Production changes are regulated basis stability and reliability benchmarks in test environments | Error Budget consumption regulates future releases to a product and act as gate to production changes |
Resilience Monitoring | π No KPI's defined for MTTx as per ITIL guidelines | π Infra and Monitoring KPI's are defined as per ITIL guidelines for MTTx, availability, throughput, reported and deviations tracked to closure | π Key monitoring signals form SLI, SLO (latency, throughput, error rate, saturation) are captured, reported and tied to product flow from business perspective |
CultureΒΆ
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Build it and run it | π Product teams build the system, operations run (and fix) it. No end to end ownership for product lifecycle. Dev and Ops staffed in separated teams | π Full ownership for product teams to build and run the system supported by SRE. No L2 support needed | π Full ownership for product teams to build and run the system. T-shape engineering profiles within the product teams to operate in full DevOps mode with enabled SRE in the product teams |
Foster and enable team experimentation linked to business value | π No time or resources dedicated for teams experimentations | π Irregular time slots or events blocked for team experimentations (e.g. team hackathon) | π Regular time slots or events blocked for team experimentations (e.g. team hackathon every month or quarter) |
Support and facilitate collaboration among teams | π No collaboration with other teams although necessary for the product | π Irregular exchange between team members and or other teams (e.g. CoP, meetings, lunch, coffee, sports) | π Regular exchange between among team members and other teams (e.g. CoP, meetings, lunch, coffee, sports) |
Collaboration | π No collaboration with Operations around product design from stability, reliability perspective | π Product teams take design inputs (feedback) around stability, reliability from SRE experts. SRE experts are involved during testing phase (in development cycle) or post issues in production | π Product architects collaborate regularly (from planning) with SRE experts to evolve the design of the product from performance, stability, reliability |
ArchitectureΒΆ
CAPABILITY | CRAWL | WALK | RUN |
---|---|---|---|
Use a loosely coupled architecture | π Monolithic application with a high level of interdependencies | π Re-architecture in progress moving from a monolithic solution to a microservice-based architecture | π System has no or very few direct dependencies to other systems. And those dependencies are tied to open standards and not tied to technologies and frameworks (e.g. Java RPC) |
Focus on independent deployability and testability | π Dependent deployability and testability across teams | π Some components can be deployed and tested independently but parts of the components still have dependencies across teams | π Teams can deploy and test their systems independently |
Use established Platform Engineering solutions as a default | π Custom solutions used even though provided by Platform Engineering | π All solution aligned with Platform Engineering, Solution and Domain Architecture, but exceptions were granted | π All solutions aligned with Platform Engineering, Solution and Domain Architecture and no custom solutions used that are provided by Platform Engineering |