In today's world, having a reliable and performant website or application is more important than ever. With so many users relying on digital experiences for their daily needs, it's critical to ensure that your services are always available and functioning properly. One way to achieve this is by leveraging Azure Front Door, a cloud-based service that provides global load balancing and advanced traffic management capabilities. In this blog post, we'll explore how you can monitor and extend the monitoring of Azure Front Door with Dynatrace to ensure that your services are always performing at their best.
What is Azure Front Door?
Azure Front Door is a cloud-based service that provides global load balancing and advanced traffic management capabilities. It allows you to improve the performance, reliability, and security of your applications by routing traffic to the closest available backend. This ensures that your users receive the best possible experience, regardless of their location or network conditions. With Azure Front Door, you can also customize your routing rules to prioritize traffic to specific backends or regions, and apply powerful security features such as SSL termination, web application firewall, and DDoS protection. Microsoft Official Docs.
What Metrics are Available Out of the Box?
Azure Front Door provides several "out of the box" metrics that you can use to monitor the performance and availability of your service. Full List here
These metrics include:
Backend health %: This metric shows the percentage of healthy endpoints for each backend pool. A backend endpoint is considered healthy if it responds to Azure Front Door's health probes within a certain time frame.
Total Latency: This metric shows the round-trip time (RTT) between Azure Front Door and each backend endpoint. A high latency can indicate network congestion, high server load, or other performance issues.
Request count: This metric shows the number of requests received by Azure Front Door for each frontend and backend endpoint. It can help you identify which endpoints are receiving the most traffic and adjust your routing rules accordingly.
Traffic: This metric shows the amount of data transferred between Azure Front Door and each backend endpoint. It can help you estimate your bandwidth usage and identify potential cost savings.
Errors: This metric shows the number of HTTP errors (4xx and 5xx) returned by each backend endpoint. A high error rate can indicate issues with your application code, database, or infrastructure.
These are pretty much the golden signals, enough for basic monitoring. But, to dive deeper and really go into delivered content and be able to pin point root cause, you will need to ingest logs. Setup Steps here.
Note: be extra careful with the ingest volumes, numbers of dimensions per log line as this can contribute to DDU consumption. Our advice is to be selective in what you send, production and mission critical subscriptions are the priority. There are methods to apply filtering via a logging pipeline that will control the flow of logs, our team can help here too.
When adding Azure Front Door metrics in Dynatrace, they are automatically captured in real time and benefit from features like Davis (AI Engine) and automation will add some out of box dashboards.
You can find Azure Front Door within "Cloud" on the Azure homepage, and then select "Azure Front Door" under services:
When selecting "services" you will see some of the out of box metrics:
and out of box Dashboards are added automatically:
Dashboard:
In addition to automated dashboards, anomaly detection is added within Dynatrace so that alerting is enabled by default. These are controlled under Settings > Anomaly Detection:
In all, the above steps takes less than 5-10 minutes to configure. Dynatrace automatically connects, controls and handles everything. For operations teams this is more than enough to understand the responsiveness, distribution and health of Azure Front Door activity. But, for devops or site reliability and the next level of troubleshooting, it doesn't go deep enough.
Dynatrace's Real User Monitoring(RUM) is a powerful feature, yet it cannot always pinpoint the source of slow performance. This is true for other RUM tools as well, since their primary benefit lies in providing a visual of problematic areas and allowing the user to easily identify pages or content with low Apdex ratings. They do not go behind the scenes.
When OOB metrics are not enough, next step logs.
Azure Front Door offers useful performance metrics as standard, however, to get a thorough understanding of how your applications are performing, it is beneficial to examine logs. The more detailed, 'per request' data from logs provides further insights into the health and performance of your application.
Instrumentation of appServices, functionApps, and AKS will not be necessary immediately. Logs will contain details on each request, furnishing enough information to detect problems related to the delivery of content. The data thus gathered assists Dynatrace administrators in deciding which applications to instrument with trace and code level forensics.
Full list of metrics from logs here. The following is a log sample from an Azure Front Door and we have highlighted some really crucial metrics {
"time": "2023-02-24T06:52:51.0162434Z",
"resourceId": "/SUBSCRIPTIONS/xxxxxx/RESOURCEGROUPS/RG-EUR-WW-PRD-SWP/PROVIDERS/MICROSOFT.NETWORK/FRONTDOORS/AFD-EUR-WW-PRD-SWP101",
"category": "FrontdoorAccessLog",
"operationName": "Microsoft.Network/FrontDoor/AccessLog/Write",
"properties": {
"trackingReference": "hAB+QJZq3IiTpBA9TUFBMjAxMDYwNTE5MDIxADAxMGE4MTkxLTkxMDYtNDFkNy1hMmI4LWRlMzMzMTczYWU2Nw==",
"httpMethod": "POST",
"httpVersion": "2.0.0.0",
"requestUri": "https://www.dynatrace.com:443/api/data/business-domains",
"requestBytes": "947",
"responseBytes": "2494",
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
"clientIp": "",
"socketIp": "",
"clientPort": "56079",
"timeToFirstByte": "1.042",
"timeTaken": "1.042",
"requestProtocol": "HTTPS",
"securityProtocol": "TLS 1.2",
"routingRuleName": "www-dynatrace-com-forward",
"rulesEngineMatchNames": [],
"backendHostname": "20.20.20.20:443",
"isReceivedFromClient": true,
"httpStatusCode": "200",
"httpStatusDetails": "200",
"pop": "MAA",
"cacheStatus": "CONFIG_NOCACHE",
"errorInfo": "NoError",
"ErrorInfo": "NoError"
}
}
When browsing Azure Front Door logs for the first time in Dynatrace, just type in "front" within the attributes and you will find all related results.
An example log line is above, you will also notice from the screenshot that logging of access logs and WAF (Web Application Firewall) logs can mean millions of log lines are ingested. So please be careful with your selection of "event hub" inclusions for forwarding, or consider a pipeline to apply advanced filtering.
Now we have logs coming into Dynatrace, we know per access log lines can enhance OOB metrics. The first task is to create a "Processing Rule" to make sure parsing takes place efficiently: Settings > Log Monitoring > Processing
Rule Name: cloud:azure:frontdoor:dynatrace
Matcher: log.source="frontdooraccesslog" AND content="www.dynatrace.com:443"
Processor definition: PARSE(content, "DATA 'www.dynatrace.com:443' [^\"]{0,150}:request.uri DATA 'timeToFirstByte\": \"' DOUBLE:time.tofirstbyte DATA 'timeTaken\": \"' DOUBLE:time.taken DATA 'httpStatusCode\": \"' INTEGER:http.status DATA 'pop\": \"' WORD:pop DATA 'cacheStatus\": \"' WORD:cache.status")
click on "Download sample Log" and "Test the rule" to confirm parsing works.
Next task is to create log to metrics and their dimensions. In the next screenshot you will see how to create the metric called log.dynatrace.frontdoor.requests and dimensions that contain the metrics we would like to capture per log line. Like we highlighted in the above sample log line, metrics that are included: cache.status, http.status, pop and request.uri.
To check whether the above log to metric is working, open "Data Explorer" to open the metric "log.dynatrace.frontdoor.requests":
Now check the dimensions are working using "Split by"
by request.uri:
by pop:
by cache.status:
Now all you need to do is create some dashboards!
Through the detailed examination of each content request, cachestatus, time to first bye, timetaken, and pop values, this approach has enabled us to make remarkable progress in optimizing customer experience. With this, we have consistently been able to bring customer apdex ratings to almost perfect levels.
Reach out to our team of qualified Dynatrace experts via hello@visibilityplatforms.com
Comments