Logging in a modern backend
In the modern day of distributed systems, the unsung hero of production debugging is having a centralized log systems and visualizer.
A centralized log system is to have a central location that all logs from relevant systems would be dumped, for developers to navigate through to determine the health and confidence in a deployed system.
In today's development world, your systems will most likely consist of several systems (microservice architecture) or load balanced instances that make it hard to have your logs on one machine or database or other location. The solution is to have all these systems pass their logs over to a central entity to be processed.
We previously set up our service to use Serilog, a structured logging framework for .NET, which we then dumped into our database. In order to view the logs, we would log into the database and sort the table by descending order and filter through until we found our relevant logs. If one system checked out, we would move to the next service's log table and do the same. As you might have assumed, this proved to be completely inadequate once we were getting public traffic, and the only thing we really could determine was how healthy the system was based on the amount of proportional errors there was in the logs.
Not only did we have the jump from database to database to track down the problem, but we also didn't have a good way to visualize what we were looking for with SQL, not the best tool to see the results.
Once we realized how the issues we faced, we determined a more fleshed out solution was needed. In the end we are going to try out AWS Elastic Search Service. Elastic Search is open source software, but AWS provides a pre-configured environment to hook into immediately, which our project sorely needs with the amount of deployments in the near future. In addition to the mechanism for consuming logs from our systems, it also provided the visualization software Kibana as a plugin to connect the data from Elastic search, and give us the ability to both track logs in real-time, as well as create dashboard widgets based on relevant and important queries we would want to see, such as potentially a specific type of error, the traffic we were getting, or some other health value.
Some really sorely needed functionality was Kibana's ability to turn long custom SQL queries into no more than a couple of clicks and a search terms. The structured capabilities of Serilog, you are able to create properties and values in addition to the message itself. Utilizing those properties, you can define the values you are looking for to further narrow down your search results to expose the real data your are looking for.
Further down the rode, as we integrate this system in, we will look to most likely include correlation IDs to requests, so we can track the total request from when it enters our system to when it completes, as well as to unify the logging format and properties between services so we can navigate through the logs more effectively.
A centralized log system is to have a central location that all logs from relevant systems would be dumped, for developers to navigate through to determine the health and confidence in a deployed system.
In today's development world, your systems will most likely consist of several systems (microservice architecture) or load balanced instances that make it hard to have your logs on one machine or database or other location. The solution is to have all these systems pass their logs over to a central entity to be processed.
Our system
I first realized this need after working in a multi-tenant, multi-service environment. To me just saying that screams head ache for anything deployed, where we don't have the ability to step through the code step by step.We previously set up our service to use Serilog, a structured logging framework for .NET, which we then dumped into our database. In order to view the logs, we would log into the database and sort the table by descending order and filter through until we found our relevant logs. If one system checked out, we would move to the next service's log table and do the same. As you might have assumed, this proved to be completely inadequate once we were getting public traffic, and the only thing we really could determine was how healthy the system was based on the amount of proportional errors there was in the logs.
Not only did we have the jump from database to database to track down the problem, but we also didn't have a good way to visualize what we were looking for with SQL, not the best tool to see the results.
Once we realized how the issues we faced, we determined a more fleshed out solution was needed. In the end we are going to try out AWS Elastic Search Service. Elastic Search is open source software, but AWS provides a pre-configured environment to hook into immediately, which our project sorely needs with the amount of deployments in the near future. In addition to the mechanism for consuming logs from our systems, it also provided the visualization software Kibana as a plugin to connect the data from Elastic search, and give us the ability to both track logs in real-time, as well as create dashboard widgets based on relevant and important queries we would want to see, such as potentially a specific type of error, the traffic we were getting, or some other health value.
Some really sorely needed functionality was Kibana's ability to turn long custom SQL queries into no more than a couple of clicks and a search terms. The structured capabilities of Serilog, you are able to create properties and values in addition to the message itself. Utilizing those properties, you can define the values you are looking for to further narrow down your search results to expose the real data your are looking for.
Further down the rode, as we integrate this system in, we will look to most likely include correlation IDs to requests, so we can track the total request from when it enters our system to when it completes, as well as to unify the logging format and properties between services so we can navigate through the logs more effectively.
Comments
Post a Comment