map('Make them useful !', Exceptions + Logs)
Exception and Logs for troubleshooting
We are all familiar with exceptions. We love and we hate them. But after all we will use it in different ways in most programming languages.
Their essential form is as follows:
public class ErrorLoverClass {
public void errorProneMethodA(int a, int b) {
try {
Log.info("Starting errorProneMethodA");
doSomethingCustomCalculationA(a, b);
Log.info("Ending errorProneMethodA");
}
catch (Exception e) {
// throw exception or log exception or do nothing and go with the flow
Log.error("An exception happens " + e);
}
}
public void errorProneMethodB(int a, int b) {
try {
Log.info("Starting errorProneMethodB");
doSomethingCustomCalculationB(a, b);
Log.info("Ending errorProneMethodB");
}
catch (Exception e) {
// throw exception or log exception or do nothing and go with the flow
Log.error("An exception happens " + e);
}
}
}
But is this good ? If we live in a silo and our system will be only used by ourselves and nobody else this is ok.
So lets assume our code is deployed but lets assume that the methods errorProneMethodA() and errorProneMethodB() in code above fails constantly. And lets say that we our devops department has a cool tool that monitors all the exceptions so they can check if their metrics are good every month.
So what we will see in our fancy tool:
And what about our lovely logs:
10:00:01 INFO "Starting errorProneMethodB" 10:00:01 INFO "Starting errorProneMethodB" 10:00:01 INFO "An exception happens NumberFormatException" 10:00:01 INFO "Starting errorProneMethodA" 10:00:01 INFO "Starting errorProneMethodB" 10:00:01 INFO "Starting errorProneMethodA" 10:00:01 INFO "Starting errorProneMethodA" 10:00:01 ERROR "An exception happens NumberFormatException" 10:00:02 INFO "Ending errorProneMethodA" 10:00:03 INFO "Ending errorProneMethodA" 10:00:01 ERROR "An exception happens NumberFormatException" 10:00:01 ERROR "An exception happens NumberFormatException" 10:00:01 ERROR "An exception happens NumberFormatException" 10:00:03 INFO "Ending errorProneMethodB" ...a large etc...
So is this useful ? Yes and No. Yes because at least in some way we are able to know what was the exception.
- But can we know why the error happens ?
- Can we reproduce the issue again ?
- Can we know quickly to what method this belongs ?
- Can we know the number of exceptions produce by this error in a week ?
Improvement #1:
- But can we know why the error happens ? No. Because we only have that exception.
- Can we reproduce the issue again ?
Same answer as above.
Lets do some simple fixes.
public class ErrorLoverClass {
public void errorProneMethodA(int a, int b) {
try {
Log.info("Starting errorProneMethodA");
doSomethingCustomCalculationA(a, b);
Log.info("Ending errorProneMethodA");
}
catch (Exception e) {
// throw exception or log exception or do nothing and go with the flow
// OLD: Log.error("An exception happens " + e);
// NEW:
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
public void errorProneMethodB(int a, int b) {
try {
Log.info("Starting errorProneMethodB");
doSomethingCustomCalculationB(a, b);
Log.info("Ending errorProneMethodB");
}
catch (Exception e) {
// throw exception or log exception or do nothing and go with the flow
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
}
So now our logs will look:
10:00:01 INFO "Starting errorProneMethodB" 10:00:01 INFO "Starting errorProneMethodB"
10:00:01 INFO "An exception happens NumberFormatException. For the following inputs: , 10.10"
10:00:01 INFO "Starting errorProneMethodA"
10:00:02 INFO "Ending errorProneMethodA" 10:00:03 INFO "Ending errorProneMethodA" 10:00:01 ERROR "An exception happens NumberFormatException. For the following inputs: 10, 10, 10.0011"
10:00:03 INFO "Ending errorProneMethodB" ...a large etc...
So lets review our questions again:
- But can we know why the error happens ? Yes it was caused by wrong inputs.
- Can we reproduce the issue again ? Yes we can do even a unit test with those parameters and fix the method.
Improvement #2:
- Can we know quickly to what method this belongs ? No because we will need to navigate the exceptions in our tool.
- Can we know the number of exceptions produce by this error in a week ? No because we group all of our exceptions in the generic Exception class.
Lets do some simple fixes.
public class ErrorProneExceptionA extends Exception {
// some specific for this exception initialization
}
public class ErrorProneExceptionB extends Exception {
// some specific for this exception initialization
}
public class ErrorLoverClass {
public void errorProneMethodA(int a, int b) {
try {
Log.info("Starting errorProneMethodA");
doSomethingCustomCalculationA(a, b);
Log.info("Ending errorProneMethodA");
}
catch (ErrorProneExceptionA e) {
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
public void errorProneMethodB(int a, int b) {
try {
Log.info("Starting errorProneMethodB");
doSomethingCustomCalculationB(a, b);
Log.info("Ending errorProneMethodB");
}
catch (ErrorProneExceptionB e) {
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
}
So in our fancy monitoring tool; now we will see the following:
Surprise, surprise ! The second method was the one which has more problems. So now we know that we need to fix that method asap !.
Reviewing our questions again:
- Can we know quickly to what method this belongs ? Yes now we know which method has the exception that happens often than others.
- Can we know the number of exceptions produce by this error in a week ? Yes now we can know how ofthen this exception happens during a week as it was shown in the graph.
Bonus Improvement:
What we can observe in our logs that are not helping to troubleshoot ?
10:00:01 INFO "Starting errorProneMethodB" 10:00:01 INFO "Starting errorProneMethodB"
10:00:01 INFO "An exception happens NumberFormatException. For the following inputs: , 10.10"
10:00:01 INFO "Starting errorProneMethodA"
10:00:02 INFO "Ending errorProneMethodA" 10:00:03 INFO "Ending errorProneMethodA" 10:00:01 ERROR "An exception happens NumberFormatException. For the following inputs: 10, 10, 10.0011"
10:00:03 INFO "Ending errorProneMethodB" ...a large etc...
Are these INFO logs ok in our production system ??? Do they add any useful information ? Yes if we would like to do a tracing for debugging purposes. And No for troubleshooting as if the code fails we will know in the exception to which method and line belows.
So how can we improve our code ? Lets do a quick and easy change:
public class ErrorProneExceptionA extends Exception {
// some specific for this exception initialization
}
public class ErrorProneExceptionB extends Exception {
// some specific for this exception initialization
}
public class ErrorLoverClass {
public void errorProneMethodA(int a, int b) {
try {
Log.debug("Starting errorProneMethodA");
doSomethingCustomCalculationA(a, b);
Log.debug("Ending errorProneMethodA");
}
catch (ErrorProneExceptionA e) {
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
public void errorProneMethodB(int a, int b) {
try {
Log.debug("Starting errorProneMethodB");
doSomethingCustomCalculationB(a, b);
Log.debug("Ending errorProneMethodB");
}
catch (ErrorProneExceptionB e) {
String message = String.format("An exception happens %s. For the following input: %s, %s", e, a, b)
Log.error(message);
}
}
}
So in our development logs:
10:00:01 DEBUG "Starting errorProneMethodB" 10:00:01 DEBUG "Starting errorProneMethodB" 10:00:01 INFO "An exception happens NumberFormatException. For the following inputs: , 10.10" 10:00:01 DEBUG "Starting errorProneMethodA" 10:00:02 DEBUG "Ending errorProneMethodA" 10:00:03 DEBUG "Ending errorProneMethodA" 10:00:01 ERROR "An exception happens NumberFormatException. For the following inputs: 10, 10, 10.0011"
10:00:03 DEBUG "Ending errorProneMethodB" ...a large etc...
But in our production logs:
10:00:01 ERROR "An exception happens NumberFormatException. For the following inputs: 10, 10, 10.0011" ...no longer a large etc....
Conclusion: Let's use exceptions to what they were created for. To capture possible errors that might happen when our code does not behave as expected in new use cases/scenarios. And as we capture these exceptions to fix them we can use specific Exceptions to make it more traceable and also use the powerful tool that logs are. But logs are only useful if we include the right info. So lets feed our logs with all the info that is required to be able to quickly reproduce an issue in production. And lets be minimalist with our logs in productions. As we just need only the info that help us to identify/reproduce an error and not the error born and death info.
-EOF