Study Google Cloud ACE Monitoring and Logging: key concepts, common traps, and exam decision cues.
This lesson is where ACE tests whether you can see what is happening in production before you make changes. Google Cloud expects you to know which signal comes from metrics, which comes from logs, and how diagnostics or audit logs change the troubleshooting path.
Ops Agent: Google Cloud agent that collects metrics and logs from supported VMs for operations visibility.
Audit log: Record of administrative or data-access activity that helps explain who changed what and when.
This section is about choosing the right signal before touching production. ACE wants you to separate:
| Need | Strongest first lane | Why it fits |
|---|---|---|
| Threshold-based alerting on CPU, memory, latency, or uptime | Cloud Monitoring | Metric-first alerting path |
| Search, filter, and inspect event or application records | Cloud Logging | Log exploration path |
| Determine who changed a resource, policy, or configuration | Audit logs | Identity and admin activity history |
| Collect VM metrics and logs from supported Compute Engine systems | Ops Agent | Host-level telemetry collection |
| Build a troubleshooting timeline from multiple signals | Monitoring plus Logging, then Audit logs if change history matters | Most production incidents need more than one signal type |
| If the question says | Think first about |
|---|---|
| threshold, SLI breach, CPU spike, alert policy, dashboard | Cloud Monitoring |
| exception text, request record, VM syslog, application output | Cloud Logging |
| who changed IAM, who deleted a resource, who accessed protected data | Audit logs |
| a VM is missing host-level telemetry | Ops Agent |
flowchart LR
A["Something is wrong in production"] --> B{"What do you need first?"}
B -->|Threshold or trend| C["Cloud Monitoring"]
B -->|Event records or error text| D["Cloud Logging"]
B -->|Who changed what| E["Audit logs"]
D --> F["Correlate with metrics"]
C --> F
E --> F
ACE does not expect deep agent internals, but it does expect you to know when a VM needs the Google-supported telemetry path.
| Situation | Strongest first move |
|---|---|
| A Compute Engine VM should emit logs and system metrics into Google operations tooling | Install or verify Ops Agent |
| You already have logs, but you need an alert when error rates climb | Use Monitoring or log-based alerts, not a new transfer tool |
| A team cannot explain who changed firewall or IAM settings | Check audit logs |
| Trap | Better reading |
|---|---|
| “Anything observable is just Cloud Monitoring.” | Metrics, logs, and audit history are different lanes. |
| “Audit logs are where all troubleshooting starts.” | Use audit logs when the question is about administrative or access history, not all incidents. |
| “Ops Agent is the alerting product.” | Ops Agent feeds telemetry from VMs; Monitoring handles dashboards and alerting. |
| “Cloud Logging is enough for threshold alerts.” | Threshold alerting starts with metrics, or with deliberate log-based metrics if the prompt goes there. |
A team receives alerts that latency spiked on a VM-hosted service. They want to confirm the metric trend, inspect the application error stream, and then verify whether an administrator changed the instance configuration shortly before the incident.
The strongest order is:
Correct answer: 1. Metrics show the trend, logs show the failure details, and audit logs answer the change-history question.