These videos provide examples of how to quickly identify failures, defects, and problems related to servers, networks, databases, security, and performance.
For a quick view of the availability and performance history of GitLab.com, we use https://stats.pingdom.com. Specifically, this has the availability and latency of reaching
We collect data using InfluxDB and Prometheus, leveraging available exporters like the node or the postgresql exporters, and we build whatever else is necessary. The data is visualized in graphs and dashboards that are built using Grafana. There are two interfaces to track this, as described in more detail below.
We have 3 prometheus clusters: main prometheus, prometheus-db, and prometheus-app. They provide an interface to query metrics using PromQL. Each prometheus cluster collects a set of related metrics:
To learn how to set up a new graph or dashboard using Grafana, take a look at the following resources
Network, System, and Application logs are processed, stored, and searched using the ELK stack. We use a managed Elasticsearch cluster on GCP and as such our only interface to this is through APIs, Kibana and the elastic.co web UI. For monitoring system performance and metrics, Elastic's x-pack monitoring metrics are used. They are sent to a dedicated monitoring cluster. Long-term we intend to switch to Prometheus and Grafana as the preferred interface. As it is managed by Elastic they run the VMs and we do not have access to them. However, for investigating errors and incidents raw logs are available via Kibana.
Staging logs are available via a separate Kibana instance.
Kibana dashboards are used to monitor application activity, spam events, transient errors, system and network authentication events, security events, etc. Commonly used dashboards are the Abuse, SSH, and Rack Attack dashboards.
One can view how we log our infrastructure as outlined by our runbook
To learn how to create Kibana dashboards use the following resources:
To add a page to this dashboard, create a merge request to the gitlab-com/gitlab-profiler project.
Blocks of Ruby code can be "instrumented" to measure performance.