Introduction
Has someone reviewed your systems’ error logs today? What about yesterday? Within the past week? Before, I would have assumed a singular answer from everyone-- “Yes”. Of course people are reviewing their error logs, because why else would they create them to begin with? Of course error log reviews are part of the QA process. Of course someone is managing and reviewing the errors in these logs. Now, however, after years of experience across multiple companies, I can tell you the answer is most likely “No” or maybe even “I don’t know”.
The Truth About Error Logs
The truth is that error logs tend to fall by the wayside of other projects and requests. Developers often times don’t have the resources to review the error logs in depth and business/marketing resources tend to not understand the errors enough to be of help diagnosing them. Most of the time QA resources don’t even have access to these logs. So who looks at them? In my experience you can set up all the email chains or permissions in the world for users to see error logs but unless the responsibility is distinctly defined, no one will review them. It is seen as an additional “chore” that can be intimidating to tackle, especially when the logs previously remained untouched or reviewed. Simply achieving a goal like having a useful error log can be difficult. So where do you begin?
Taking Charge
Take charge of the situation. As a Business Analyst, part of your job is to determine where the company can improve their current systems, and that includes utilizing the error log to identify both serious issues that need immediate attention and low priority items that can be addressed long-term to improve system processes or user accessibility. If the error log currently being utilized is too cluttered with meaningless warnings or errors to be of use, work with the development team to alleviate that. The goal is to have a functional error log that can indicate when something has gone wrong or when new releases introduce an issue. After ensuring the error log is readable and useable, the next step is to become familiar with what a “normal” error log looks like for the system. How many total errors per day is normal? This number will likely change per day depending on typical web traffic. On days where your websites receive more traffic, you can expect more organic error messages than on a day in which traffic is lighter. Which errors are more often at the top of the error log? Essentially the goal in this step is to understand your error logs (or to word it another way, become the “knowledge expert” on your error logs) so you can identify new issues in the error log at a glance. If you are used to seeing more of one error and instead see another at the top of the log for that day, investigate what may have caused that change. Anytime an error log changes in any tangible way, you should be asking “why”. If you encounter an error often enough in the error logs, you should also take the time to understand what that error means. Whether that means googling the terms or asking a development resource for a moment of their time, it’s important to understand where an error log line item belongs in the spectrum of importance.
The Outcome
Some of the most severe bugs my clients have fixed have ended up being identified as a result of error log reviews. At one of my prior companies, I noticed a small error log line item I hadn’t seen previously a few months in regarding our search functionality and I reached out to one of our lead developers via email. Before the end of business that day we had released a hotfix to resolve the issue, as that small new error identified a glaring security hole in our product. At my current company I have helped identify issues introduced in releases as well as identify issues introduced by our third party partners that interact with our systems. I’m not even on the official mailing list for our error logs; instead, a coworker has to forward me the email daily; but I’m still the one setting eyes on it every day and reviewing the results for action items. The result wherever I have taken charge of the error log is this: a higher level of quality assurance at a low level of effort and the bonus of being far more knowledgeable about technical issues and error severity. So-- has someone reviewed your systems’ error logs today? If the answer is “no”, why haven’t you?
No comments:
Post a Comment