Meta issue to track various issues listed here is at on the infrastructure tracker.
Performance of GitLab and GitLab.com is ultimately about the user experience. As also described in the product management handbook, "faster applications are better applications".
Our target is: an average Speed Index of less than 2 seconds for GitLab.com
The Speed Index is "the average time at which visible parts of the page are displayed".
There are many other performance metrics that can be useful in analyzing and prioritizing work, some of those are discussed in the sections below. But the user experienced Speed Index is the target for the site as a whole, and should be what everything ties back to in the end.
In everything that is to follow, times are measured from a single geo-location (in Europe) using "Cable" connectivity for that location (5 /1 Mbps).
The URLs from GitLab.com listed in the table below form the basis for measuring performance improvements - these are heavy use cases. The times indicate time passed from web request to "the average time at which visible parts of the page are displayed" (per the definition of Speed Index). Since the "user" of these URLs is a controlled entity in this case, it represents an external measure of "Speed Index".
|Issue List: GitLab FOSS Issue List||2872||1197||-||N/A|
|Issue List: GitLab Issue List||1581|
|Issue: GitLab FOSS #4058||2414||1332||1954|
|Issue Boards: GitLab FOSS repo boards||3295||1773||-||N/A|
|Issue Boards: GitLab repo boards||2619|
|Merge request: GitLab FOSS !9546||27644||2450||1937|
|Pipelines: GitLab FOSS pipelines||1965||4098||-||N/A|
|Pipelines: GitLab pipelines||4289|
|Pipeline: GitLab FOSS pipeline 9360254||4131||2672||2546|
|Project: GitLab FOSS project||3909||1863||-||N/A|
|Project: GitLab project||1533|
|Repository: GitLab FOSS Repository||3149||1571||-||N/A|
|Repository: GitLab Repository||1867|
|Single File: GitLab FOSS Single File Repository||2000||1292||-||N/A|
|Single File: GitLab Single File Repository||2012|
|Explore: GitLab explore||2346||1354||1336|
|Snippet: GitLab Snippet 1662597||1681||1082||1378|
*To access the sitespeed grafana dashboards you need to be logged into your Google account
Note: Since this table spans time before and after single-codebase we kept GitLab FOSS pages close to GitLab ones to enable comparisons despite not being exactly the same project.
If you activate the
runs toggle you will have annotations with links to all full reports. Currently we are running measurements every 2 hours.
All items that start with the tachometer () symbol represent a step in the flow that we measure. Wherever possible, the tachometer icon links to the relevant dashboard in our monitoring. Each step in the listing below links back to its corresponding entry in the goals table.
Consider the scenario of a user opening their browser, and surfing to their dashboard by typing
gitlab.com/dashboard, here is what happens:
HTTP queue time.
RootController#index. The round trip time it takes for a request to start in Unicorn and leave Unicorn is what we call
Transaction Timings. RailsController requests are sent to (and data is received from):
gitlab.com/dashboardexample, the controller addresses all three .
Load) when this particular user hits
gitlab.com/dashboard/issues. The number of SQL calls will depend on how many projects the person has, how much may already be in cache, etc.
view timings). In some controllers, data is gathered first after which a view is constructed. In other controllers, data is gathered from within a View, so that the
view timingin those cases includes the time it took to call NFS, PostgreSQL, Redis, etc. And in many cases, both are done.
gitlab.com/dashboard/issues, there are 56 nested / partial views rendered (search for
First Byte - Externalis measured for a hand selected number of URLs using SiteSpeed
defer="true", so they are parsed and executed in the same order as they are called but only after html + css has been rendered.
DOMContentLoadedevent. The new call is for a new URL, and such requests are routed either through the Web or API workers, invoke their respective Rails controllers on the backend, and return the requested files (HTML, JSON, etc). For example, the calendar and activity feeds on a username page
gitlab.com/usernameare two separate AJAX calls, triggered by
First read about the steps in a web request above, then pick up the thread here.
After pushing to a repository, e.g. from the web UI:
git-receive-packprocess (on the workhorse machine) to save the new commit to NFS
git-receive-packfires a git hook to trigger
post-receivehook, and the
git-receive-packprocess passes along details of what was pushed to the repo to the
post-receivehook. More specifically, it passes a list of three items: old revision, new revision, and ref (e.g. tag or branch) name.
post-receivehook to Redis, which is the Sidekiq queue.
Consider the scenario of a user opening their browser, and surfing to their favorite URL on
GitLab.com. The steps are described in the section on "web request". In this table, the steps are measured and goals for improvement are set.
Guide to this table:
# per request: average number of times this step occurs per request. For instance, an average "transaction" may require 0.2 SQL calls, 0.4 git calls, 1 call to cache, and 30 nested views to be built.
p99 Q2-17: the p99 timing (in milliseconds) at the end of Q2, 2017
p99 Now: link to the dashboard that displays the current p99 timing
p99 Q3-17: the target for the p99 timing by the end of Q3, 2017
|Step||# per request||p99 Q2-17||p99 Now||p99 Q3-17 goal||Issue links and impact|
|Lookup IP in DNS||1||~10||?||~10||Use a second DNS provider|
|Browser to Azure LB||1||~10||?||~10|
|BACKEND PROCESSES||Extend monitoring horizon|
|Azure LB to HAProxy||1||~2||?||~2|
|HAProxy SSL with Browser||1||~10||?||~10||Speed up SSL|
|HAProxy to NGINX||1||~2||?||~2|
|NGINX buffers request||1||~10||?||~10|
|NGINX to Workhorse||1||~2||?||~2|
|Workhorse distributes request||1||Adding monitoring to workhorse|
|Workhorse to Unicorn||1||18||10||Adding Unicorns|
|Workhorse to Gitaly||?|
|Workhorse to NFS||?|
|Workhorse to Redis||?|
|Unicorn calls services||1||2500||1000||Allow more GitLab internals monitoring|
|Unicorn Postgres||250||100||Speed up slow queries|
|Unicorn NFS||460||200||Move to Gitaly - sample result|
|Unicorn constructs Views||1500|
|Unicorn makes HTML|
|HTML to Browser|
|Unicorn to Workhorse||1||~2||?||~2|
|Workhorse to NGINX||1||~2||?||~2|
|NGINX to HAProxy||1||~2||?||~2||Compress HTML in NGINX|
|HAProxy to Azure LB||1||~2||?||~2|
|Azure LB to Browser||1||~20||?||~20|
|FIRST BYTE (see note 1)]||1080 - 6347||1000|
|SPEED INDEX (see note 2)||3230 - 14454||2000||Remove inline scripts, Defer script loading when possible, Lazy load images, Set up a CDN for faster asset loading, Use image resizing in CDN|
|Fully Loaded (see note)||6093 - 14003||not specified||Enable webpack code splitting|
Table to be built; merge requests welcome!
For any performance metric, the following modifiers can be applied:
|Internal: the time as measured from inside GitLab.com's infrastructure (the boundary is defined as being at the "network||Azure load balancer" interface).|
Timing history for First Byte are listed in the table below (click on the tachometer icons for current timings). All times are in milliseconds.
|Type||End of Q4-17||Now|
|Issue: GitLab CE #4058||857|
|Merge request: GitLab CE !9546||18673|
|Pipeline: [GitLab CE pipeline 9360254]||1529|
|Repo: GitLab CE repo||1076|
To go a little deeper and measure performance of the application & infrastructure without consideration for frontend and network aspects, we look at "transaction timings" as recorded by Unicorn. These timings can be seen on the Rails Controller dashboard per URL that is accessed .
For instance, to get the transaction timing for the merge request referenced above first visit the merge request page, then visit the Rails Controller dashboard and scroll down to the Transaction Details table. We do not currently have time series graphs per URL nor do we have specific targets in terms of what this timing should be.
~availability label directly impacts the availability of GitLab.com. It is considered as another category of
We categorize these issues based on the impact to GitLab.com's customer business goal and day to day workflow.
The prioritization scheme adheres to our product prioritization where security and availability work are prioritized over feature velocity.
The presence of these severity labels modifies the standard severity labels(
~S4) by additionally taking into account the impact as described below.
The severity of these issues may change depending on the re-analysis of the impact to GitLab.com customers.
|Severity||Availability impact||Reproducibility||Time to resolve (TTR)||Deployment target||Minimum priority|
||Roadblock on GitLab.com and blocking customer's business goals and day to day workflow||Consistently reproducible||Within 48 hrs||Hotfix to GitLab.com||
||Significant impact on GitLab.com and customer's day-to-day workflow. Customers have an acceptable workaround in place.||Consistently reproducible||Within 5 business days||Next deployment window after resolution||
||Broad impact on GitLab.com and minor inconvenience to customer's day-to-day workflow. No workaround needed.||Inconsistently reproducible||Within 30 days||Next release after resolution||
||Minimal impact on GitLab.com, no known customers affected||Inconsistently reproducible||60 days||Next release after resolution||
To call out specifics on what priorities can be set on an availability issue, please refer to the prioritization band table below.
|Issue with the labels||Allowed priorities||Not-allowed priorities|
To clarify the priority of issues that relate to GitLab.com's performance you should add the
~performance label, as well as a "Severity"
label. There are two factors that influence which severity label you should pick:
For strictly performance related work you can use the Controller Timings Overview Grafana dashboard. This dashboard categorises data into three different categories, each with their associated severity label:
This means that if a controller (e.g.
UsersController#show) is in the
"Frequently Used" category you assign it the
For database related timings you can also use the SQL Timings Overview. This is the dashboard primarily used by the Database Team to determine the AP label to use for database related performance work.
Some general notes about parameters that affect database performance, at a very crude level.