Environments Group

The Environments group is responsible for the Environments in the Deploy stage of the DevOps lifecycle.

Vision

For an understanding of where this team is going, take a look at the product vision.

As a member of the Ops Sub-department, you may also like to understand our overall vision.

Mission

OKRs

Product Indicators

Contribution to GitLab

Team Members

Name Role
Nicolò Maria MezzoperaNicolò Maria Mezzopera Fullstack Engineering Manager, Deploy:Environments
Andrei ZubovAndrei Zubov Senior Frontend Engineer, Release
Andrew FontaineAndrew Fontaine Senior Frontend Engineer, Deploy:Environments
Anna VovchenkoAnna Vovchenko Senior Frontend Engineer, Deploy:Environments
Senior Backend EngineerSenior Backend Engineer Senior Backend Engineer
Staff Backend EngineerStaff Backend Engineer Staff Backend Engineer, Deploy:Environments
Pam ArtiagaPam Artiaga Backend Engineer
Taka NishidaTaka Nishida Senior Backend Engineer
Tiger WatsonTiger Watson Senior Backend Engineer, Deploy:Environments
Timo FurrerTimo Furrer Senior Backend Engineer

Stable Counterparts

The following members of other functional teams are our stable counterparts:

Name Role
Ameya DarshanAmeya Darshan Security Engineer, Deploy:Environments, Systems:Gitaly, Systems:Geo, Delivery, Snippets
Emily BaumanEmily Bauman Senior Product Designer, Deploy:Environments
Phillip WellsPhillip Wells Technical Writer, Deploy:Environments
Viktor NagyViktor Nagy Senior Product Manager, Deploy:Environments

Some dedicated Slack channels:

Insights

Processes

Acronyms

  • Engineers: All the Engineers of the Environments group
  • Engineering: Engineers and the Engineering Manager
  • EM: Engineering Manager
  • PM: Product Manager
  • FE: Frontend Engineer
  • BE: Backend Engineer
  • UX/PD: User Experience Designer
  • TW: Technical Writer

Meetings

Environments Team Meeting

We have one team meeting each week. The time alternates every week to accommodate APAC/EMEA and EMEA/AMER. The purpose of this meeting is to share information about the ongoing projects. It also contains general announcements that are important for collaboration.

Meeting format:

  • Before and during the meeting, team members write anything they want to verbalize in the notes attached to the meeting.
  • We wait 1 or 2 minutes for the team members who want to join and then start recording when ready.
  • During recording, we go over each point in the document.
  • Anyone can facilitate the discussion. If the EM or PM is there, they will kick things off.
  • If the author of the current point being discussed is available, they can verbalize their point.
  • A short discussion may occur around each point, taking into consideration that we want to get through as much of the document as we can.
  • All team members are welcome and encouraged to help take notes in the document while the discussion takes place.
  • After we get through all the points, we stop recording.
  • If there is left over time, team members may use the remaining unrecorded time to socialize or leave the meeting early.

If the meeting for the week has already taken place and you would like to add a new item for discussion, create a new section for the next meeting date above the last one and add your item.

Frontend, Go and Ruby Meetings

These are optional meetings on the team calendar. Everyone on the team is welcome. They are prioritized to be at a time where as many of the engineers who work primarily on the corresponding topics can attend.

These meetings are not too formal and also provide time for the engineers across time zones to discuss ongoing projects, ask questions, pair up, and catch up. We go through any agenda items first.

Meeting Links:

Technical Discovery Meetings

Sometimes we will encounter issues that need the input of the whole team to be refined and then worked on, such issues will be selected as a topic for a Technical Discovery meeting. We try to be conscious of sync time and so we expect a maximum of two of these meetings for each milestone. A technical discovery meeting consists of:

  • 2 Meetings across timezones so that everyone has a fair opportunity to join the sync meeting.
    • APAC/EMEA
    • EMEA/AMER
  • The 2 meetings are recorded and share the same agenda.
  • The meetings are announced at least one week before they will be held and each participant must familiarize themselves with the issue that is being investigated prior to attending one of the meetings.
  • Discussing the topic async in advance in the issue/epic is encouraged.
  • The PM will open the conversation either by describing the use case/scenario or by recording a quick video about it.
  • The meetings are agenda first and everyone is expected to write their comments and questions in the agenda before the first meeting.
  • If the agenda is empty the meeting is cancelled.
  • The conversations in the meeting must be recorded in the same documents as notes.
  • One host is decided for every meeting, and they are responsible to drive the conversation forward.
  • In the last meeting someone is appointed to summarize the conversation either in the original issue or a technical document.

The goal of technical discovery meetings is to come up with a concrete technical proposal for the question at hand. We should not force a proposal, but aim to get there and write the conclusion accordingly with potential follow-ups.

Design Pair Meeting

Each week the Product Designer hosts a design pairing session with the team on Thursdays at 1:30pm UTC. The goal of the design pairing sessions is to give the team more insight into what Product Design is currently working on, share feedback and questions, as well as give us a space to brainstorm and work together through bigger problems. Anyone is encouraged to propose topics or existing user problems that could use some brainstorming together as a team. A design pairing session consists of:

  • An agenda document with the weekly topic prepared the day before.
  • 30 minutes of the call will be dedicated to a design walkthrough or brainstorming session.
  • 20 minutes of the call will be dedicated to a Q&A from the engineering team (previously know as the FE/UX sync).
  • A recording which will be posted to the team’s Youtube playlist.

If there are no topics, the meeting can be cancelled for the week.

Engineering Pair Meeting

Due to high distrubtion and full calendars, we can have one of these for as many regions for which we have a DRI.

AMER

DRI: @hustewart

Time: Tuesdays at 6:00pm UTC.

Goal: Solve problems together and build trust/rapport.

Calendar event

What we’ll do:

  • Introductions for people who don’t know each other.
  • Pick a problem to solve.
  • Pair.
  • Document what we learn.

Feel free to submit a topic beforehand (notes are on calendar event), or simply show up.

All team members are welcome.

EMEA/APAC

DRI: @andrei.zubov

Time: Every second Tuesday at 8:00am UTC.

Calendar event

Team issue tracker

Issue refinement

Every week the refinement bot assigns a team member as the refinement DRI, who is responsible for refining some issues from the top of the ~"workflow::refinement" list (the list is prioritised top to bottom) in the Milestone Board and another issue of their choice with a ping to the EM and/or PM explaining the reasoning.

The refinement process is described in the issue template.

Planning

Issue Weighting

The weights we use are:

Weight Extra investigation Surprises Collaboration
1: Trivial not expected not expected not required
2: Small possible possible possible
3: Medium likely likely likely
5: Large guaranteed guaranteed guaranteed

The above table is contextual. For example, domain knowledge, experience levels, and time at GitLab can impact an engineer’s perspective on whether an issue requires Extra Investigation or Surprises.

Weights are not set in stone. We do our best to get it right during refinement, but we want to be transparent and accurate. If an issue is taking more effort than is reflected in the existing weight, the DRI on the issue is encouraged to change the weight. We want accurate documentation of the level of effort that was required.

By giving a weight 1 to an issue, we’re saying “we can’t benefit from this issue being broken down into smaller units of shippable work.”

Anything 5 or larger should be broken down, these should not be ready for development. We would likely turn a 5 into an epic, into a research and implementation issue or a technical discovery.

Occasionally, a proof-of-concept (POC) is necessary to determine a feasible technical path. When one is required, the engineer will create a POC issue that contains the context of the research to be conducted along with the goals of the POC. This issue will be scheduled for work before any further breakdown of tasks is performed. Once the technical path is clear, the engineer can proceed to weight the issue and/or break down the issue further to guide implementation. Every POC issues should contain a list of questions we want to answer, the definition of done should include the answers and suggested next steps.

Not all POCs will be successful, and that is OK! Some avenues of research may not be successful, and the POC will have saved us from investing significant time in a solution that will not meet our needs. The goal is early feedback and fast iteration.

As a note, designers use the design weight labels instead of using the weight input within the issue, which is reserved for engineering.

Weight, Velocity, and Planning

We intentionally leave the term “velocity” undefined and do not use it in planning workload capacity for the team.

We leave the question of interpreting summed weights open to each unique situation.

When making decisions about how much work the team can take on for a milestone, we trust individual impressions and instincts reflected in the discussions that take place in the planning issue and the refinement process. The weighting system helps foster these discussions.

GitLab Terraform Provider

The GitLab Terraform Provider is managed by the Environments group.

Milestone Board

The issues scheduled for a milestone can be tracked at Milestone Board.

This board contains all the necessary columns to track the workflow of the team, in particular:

  • ~"workflow::refinement" the list of issues that needs to be refined before they can be assigned.
  • ~"workflow::ready for development" the list of issues that are ready to be worked on, both assigned and not assigned to the milestone.
  • One or more Milestone columns containing the planned work for the given milestone.

All the columns are prioritised top to bottom.

Once a team memeber self-assigns an issue on the Milestone Board, issue labels should follow the Engineering Workflow.

For Merge Requests, it’s up to the author and the project they are contributing to, to decide if they want to use these ~workflow:: labels. It is not required to use them or keep them synced up with the Issue labels.

Feature development

Our goal is to move towards a continuous delivery model so the team completes tasks regularly, and keeps working off of a prioritized backlog of issues. We default to team members self-scheduling their work:

  • Team members self-assign issues from the Milestone Board that are in the workflow:ready for development column and has the current milestone.
  • ~Deliverable issues take priority over any other work, as they are the main focus of each milestone and inform our say-do ratio.
  • Once a team member has completed their assigned issues, they are expected to go to the Milestone Board and assign themselves to the next unassigned issue from the current milestone.
  • If there are no more issues in the current milestone, engineers are expected to assign themselves to the next unassigned workflow:ready for development issue.
  • The issues on the board are in priority order based on importance (the higher they are on the list, the higher the priority). This order is set by the product manager.
  • If all issues are assigned for the milestone, team members are expected to identify the next available issue to work on based on the team’s work prioritization (see below).
  • While backstage work is important, in the absence of specific prioritization, the team will have a bias towards working on bug or feature categorized issues.

~Environments::EngineeringChoice process

While diligently pursuing our objectives, we also recognize the significance of work that resonates personally with our engineers. To facilitate this, we have introduced the “~Environments::EngineeringChoice” label. Here’s how it works:

  1. Selection During Milestones: In each Milestone Plan, engineers are encouraged to select up to five issues (total for the group) that they find particularly interesting or valuable, marking them with the “~Environments::EngineeringChoice” label. These issues should improve GitLab product or developer experience but they don’t have to be in the ~“group::environments” domain.
  2. Limit per Milestone: To maintain focus, no more than five issues should be labeled with “~Environments::EngineeringChoice” within a single milestone.
  3. Priority After Deliverables: Once all mandatory ~Deliverables are completed, the next priority is to address issues labeled “~Environments::EngineeringChoice.”
  4. Refined issues only: Before applying the “~Environments::EngineeringChoice” label, the issue should be worfklow::ready for development and accordingly needs a weight.
  5. Maximum issue size: To rule out likely surprises and extra investigation, only issues with weight 1-2 are acceptable for “~Environments::EngineeringChoice”.
  6. Tracking in Milestone Planning: Progress and choices under the “~Environments::EngineeringChoice” category will be monitored and recorded in a specific section of the Milestone Planning issue.

Bug fixing and priortized work

In every milestone plan, we compile a list of bugs due in the coming milestone based on the severity SLA.

In addition to the self-scheduling of feature development, the manager will from time to time assign bugs, or other work deemed important, directly to a team member.

On top of the official issue-triaging process, when severity labels are assigned/changed, we aim to set the issue due date at the same time. If there are bugs with a severity label and without a deadline, everyone is encouraged to set the deadline based on the date of the last severity label update and the SLA for the given severity.

Say-do ratio

Our team keeps track of their commitment with say-do ratios, two metrics are important: say-do and reprioritized say-do

  • Say-do only applies to ~Deliverable issues.
  • By the 17th of the month the ~Deliverable label is applied to the upcoming milestone issues by the EM.
  • We aim roughly to assign one ~Deliverable for each engineer, this may change milestone by milestone.
  • Any issue that has the ~Deliverable label at that point is considered as promised to be delivered and is part of our say-do ratio.
  • If at any time during the milestone a ~Deliverable label is removed or the issue is removed from the milestone that issue does not count anymore in the reprioritized say-do metric, but still does count for say-do.

We aim to achieve 100% re-prioritized say-do and at least 80% say-do.

Example
  • In the milestone 15.11 we have 10 ~Deliverable issues labelled as such by the 17th of March 2023
  • Along the way we realise that 5 of those ~Deliverable issues will not make it, and reasonably before the end of the milestone, we move them to 16.0
  • At the end of the milestone there is an hiccup and of those 5 remaining issues, 1 is not completed.

Our say-do ratio would be 40% (4 out of 10) Our reprioritized say-do would be 80% (4 out of 5)

MR reviews

Team members should use their best judgment to determine whether to assign the first review of an MR based on the DangerBot’s suggestion or to someone else on the team. Some factors in making this decision may be:

  • If there is known domain expert for the area of code, prefer assigning the initial review to them.
  • Does the MR require a lot of context to understand what it is doing? Will it take a reviewer outside the team a long time to ramp up on that context?
  • Does the MR require a lot of dev environments setup that team members outside the Environments group are likely to not have?
  • Is the MR part of a larger effort for which a specific team member already has all the context?

As team members and domain experts, both the MR author and initial reviewer are encouraged to share the broader context before, during, and throughout the review process to assist maintainers in conducting efficient reviews. This context may cover:

  • Known limitations;
  • Edge cases;
  • Implementation reasoning;
  • Links to relevant references.

Providing context helps streamline the review process and invites a broader pool of maintainers to our domain (example).

Handling Deferred UX

Team members should make their best effort to resolve UX issues as they come up during MR reviews. However, there are times where the changes requested or feedback given would significantly slow down velocity. For the sake of efficiency and iteration, a Deferred UX issue must be opened to follow up on the feedback.

In these instances, the engineer who authored the original MR should assign themselves the issue and become the DRI to evaluate the UX feedback. This may mean reaching out to the team’s Product Designer to ensure the feedback is actionable and resolving the debt is prioritized appropriately during the following milestone planning. For example, for Deferred UX issues opened in the 16.3 milestone, engineers should evaluate and ensure appropriate prioritization of the issue during the planning of the 16.4 milestone. This does not mean that the issue must be resolved during the 16.4 milestone, but that the issue is placed into the appropriate step of our product development flow, or closed if appropriate.

This helps to ensure that Deferred UX issues are resolved in a timely manner, keeping with the overall goals of the group and adherence to broader engineering workflows.

Epic Ownership

The Environments group uses epics to describe features or capabilities that will increase the maturity of the Environments categories over time.

Each epic should be owned by an engineer who is responsible for all technical aspects of that epic. The engineering DRI will work closely with the Product Manager and Product Designer to understand the requirements and create issues that encapsulate the technical work required during the design/solution validation phases and build track of the Product Development Flow. Each issue needs to be weighted and contain enough information in the description area for any other engineer on the team to be able to pick up that work.

For the duration of building the epic, the engineer does not need to be the only person implementing the issues. They should keep watch of the work that is done on the issues so that they can verify that the work is progressing correctly. If there are problems with the work, or lengthy delays, they need to make sure the Product Manager and Engineering Manager are aware.

When work is nearing completion, the engineer should make sure that any additional issues that may have come up during the build process are either addressed, or scheduled for work. Additional issues should be created and added to the epic. This will help to make sure that we do not build up technical debt while building.

Finally, they should also monitor any work that needs to occur while rolling out the Epic in production. If there are rake tasks, database migrations, or other tasks that need to be run, they need to see those through to being run on the production systems with the help of the Site Reliability counterpart.

This places a lot of responsibility with the DRI, but the PM and EM are always there to support them. This ownerships removes bottlenecks and situations where only the PM or EM is able to advance an idea. In addition, the best people to decide on how to implement an issue are often the people who will actually perform the work.

To declare an ownership, insert DRI: <your-gitlab-handle> at the top of the epic description. Example.

Quality Processes

Maintaining a high standard of quality is a critical factor to delivering winning products.

Within the Environments group we use the following processes and best practices to ensure high quality.

  1. We ensure each MR is accompanied with meaningful unit tests and integration tests.
  2. For each major feature we develop and maintain End to End tests that run nightly and confirm no regressions have been introduced to critical paths.
  3. On a weekly basis, we review our Triage report for bugs and regressions and take the appropriate action.
  4. We review the quality dashboard each milestone to track our long term progress at improving quality.

End to End Testing

The Environments group uses GitLab QA for End-to-End testing. We have guidelines for how our team is leveraging these tests.

gitlab-agent QA bot

In feed_alerts_configure we have a bot that runs tests at this project

If this bot alerts of a failed pipeline, we should treat these the same as a broken master branch.

  • Check the pipeline for intermittent errors (and retry if this is the case)
  • Otherwise create an investigation issue to dig further/fix.

Error Budget

Our target availability is 99.9%

Error Budget failure DRI process

Each week we receive an Error Budget report in #cd-section on Slack if we are under our target availability.

An engineer might be assigned as a DRI to look into this.

The DRI is neither expected to determine a root cause nor propose a solution on their own.

The DRI should instead reach out to the Scalability:Projections team for support.

Async Issue Updates

In order to optimize async collaboration across a big team we use issue updates to share progress completed on a specific issue or epic. Weekly updates on progress and status will be added to each issue by its assignee. A weekly update may be skipped if there was no progress. It’s preferable to update the issue rather than the related merge requests, as those do not provide a view of the overall progress. This applies to issues with the labels workflow::in dev or workflow::in review

The status comment should include what percentage complete the work is, the confidence of the person that their estimate is correct and, a brief note on what was done. It’s perfectly acceptable to have multiple updates if more than one DRI is working on the issue.

As a part of the async update it’s important to verify that the issue and related MRs workflow labels are correctly set.

Example

## Async status update

- Complete: 80%
- Confidence: 90%
- Notes: expecting to go into review tomorrow

To simplify the work of adding and keeping track of async updates TalTal can be used.

Career Development and Promotions

We want every team member to be advancing in their Career Development.

We follow the Engineering Department Career Development Framework.

Maximize asynchronous performance in this team

Async practices are particularly important to us because we live in time zones that do not afford much, if any, overlap during our working hours.

To maximize our asynchronous performance, we should follow the GitLab Communication guideline. More specifically, the following points are important:

  • Have an SSOT discussion page (Issue or MR). This is the main collaboration point that everyone can get the latest information quickly. The description section should contain essential and up-to-date information, such as:
    • What’s the problem to solve?
    • Who’s the DRI in charge of making the decision?
    • What’s the acceptance criteria (e.g. user experience goal)?
    • Is anything out of scope?
    • What proposals do we have?
    • What are the PROs/CONs and technical difficulty of each proposal?
    • Whose approval do you need for making the decision?
    • When is the due date to make the decision?
    • FAQ
  • The DRI keeps the description updated with latest information based on any decisions made in threads.
  • When a team member is asked to give input, they should respond as soon as possible to unblock discussions. It’s also fine to respond that you don’t have any feedback or can’t take time for it, so that the DRI can avoid waiting for your response.
  • If the DRI didn’t get much progress from the asynchronous communication, the DRI should schedule a synchronous meeting or reach out to broader audiences.
  • When the DRI schedules a sync meeting, they should make sure that agendas are prepared before the meeting starts.

Monthly Showcases DRI

We participate in the OPS showcase initiative, to facilitate the selection of topics, the creation of the issues and content we have a Showcase DRI which will:

  • Ensure every month at least a showcase issue is created and linked in the right issues/epic
  • Facilitate the selection of the topic of each showcase, paying attention to give space to everyone in the team
  • Help whoever is creating the content with video creation and issue description
  • Ultimately is the showcase DRI responsibility that a showcase issue is produced and ready in time

Currently the showcase DRI for FY24Q3 is: @anna_vovchenko

How to work with us

Default to GitLab Issues

Why

We think that using GitLab Issues as much as possible is the best way to align with our values of Transparency and Efficiency. Using Issues gives us the greatest chance of collaboration, reusing any work done, and documenting the request and outcome in a findable, persisted way.

How

Follow the guidance in our request for help documentation.

How to contribute to Auto DevOps

Read our specific GDK instructions as well as our handbook entry on what existing testing does and how to develop features for Auto DevOps.

Shared Cloud Infrastructure

The Environments group has access to a shared GCP project which can be used for demos, experiments, or to host auxilliary services. The project id is deploy-stage-shared-i-e55e01cb and was created and provisioned using the following ARs:

If you need to create permanent infrastructure in that GCP project, it’s encouraged to do it with Terraform to easily share and document the setup with the entire group. You can use this GitLab group to host the project.

If the infrastructure is temporary, you can manage it with whichever tools you prefer.

Currently hosted projects:

  • arm64-runners: IaC for arm64-based GitLab runners for some projects requiring it due to the lack of arm64-based shared runners.

Example/Demonstration projects

When you need to create an example project for demonstartion, consider having it in the example group instead of your personal namespace.

This allows us to collect all of the knowledge under the same place. Also, this example group has EEP license by default.


Auto DevOps
Auto DevOps is a technology that allows automated application of DevOps best practices.
Environments Group - GitLab Quality Assurance End-to-End Testing for the Environments group
Overview The goal of this page is to document how the Environments group uses the GitLab QA framework (video walkthrough) to implement and run end-to-end tests. Supporting slides for the above walkthrough Why do we have them End-to-end testing is a strategy used to check whether your application works as expected across the entire software stack and architecture, including the integration of all micro-services and components that are supposed to work together.
Last modified April 9, 2024: Update the MR review section (8ad1a277)