The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.
To ensure that self-managed customers have performant, reliable, and scalable on-premise configurations, the Quality Department is creating several reference architectures. Our goal is to provide tested and verified examples to customers which can be used to ensure good performance and give insight into what changes need to be made as organizations scale.
|Users||Status||Link to more info|
|2.5k||To Do (Q4)||Issue link|
|25k||In Progress (Q3)||Issue link|
|50k||In Progress (Q3)||Issue link|
We have created the GitLab Performance Toolkit which measures the performance of various endpoints under load as well as web rendering performance using SiteSpeed. This toolkit is in use internally within GitLab, but it is also available for self-managed customers to set up and run in their own environments.
If you have a self-managed instance and you would like to use the Toolkit to test its performance, please take a look at the documentation in the Toolkit's README file.
Once a day, the GitLab Performance Toolkit is run against the existing reference architecture using a recent or the latest release of GitLab. This allows us to catch and triage degradations early in the process so that we can try to implement fixes before a new release is created. If problems are found, issues are created for degraded endpoints and are then prioritized during the weekly Availability & Performance Refinement meeting.
The latest results against our various testing environments are automatically posted to a wiki page in the Performance project.
In Q3, the Quality Department has a goal of automating the testing process so that each new monthly release is tested and compared to the release before it. Work on this project is ongoing and is prioritized after the creation of the 25k and 50k reference environments described above. You can track progress on this quarterly goal using our OKR issue.
The endpoint coverage of the load tests in our Toolkit is not yet comprehensive. We have done a review of our common endpoints with an eye towards spotting the most highly used ones as well as the slowest ones. Issues have been created for our team to add these to the Toolkit, and we expect the addition of some of these will surface degraded endpoints which we'll need to send through performance refinement as defined in the Daily Testing Process.
Additionally, the analysis that was performed was ad-hoc and we would like to define a process for conducting a review on some regular cadence, whether that is after every release, once a quarter, or some other timing. Because GitLab is constantly expanding and evolving, we need to iterate on our coverage in tandem.
We've created an epic to track the initial expansion as well as the work defining our recurring process for analyzing endpoints and verifying our coverage is adequate.
When self-managed customers experience or suspect they are experiencing performance issues, we have developed a playbook for initial steps to investigate the problem.
The first step is requesting logs. We use a tool called fast-stats in conjunction with the following log artifacts. These logs should be either rotated, or logs from a peak day after peak time.