«Paper 128-2007 Finding Your Mistakes Before They Find You: A Quality Approach For SAS Programmers Rick M. Mitchell, Westat, Rockville, MD ABSTRACT ...»
SAS Global Forum 2007 Planning, Development and Support
Finding Your Mistakes Before They Find You: A Quality
Approach For SAS Programmers
Rick M. Mitchell, Westat, Rockville, MD
High quality work is critical to the success of a SAS programmer. While we are all human and undoubtedly
may make a rare slip from time to time, one minor programming discrepancy can damage a SAS programmer's credibility if it is discovered by the client rather than through internal review. A strong quality approach to SAS programming will help earn a deep trust over time from one’s coworkers and clients as one routinely strives toward sound research and accurate reporting of data. This paper will discuss a quality approach that can benefit SAS programmers at all levels. By implementing this quality approach that includes a QA checklist, a peer review, and a team approval process, a SAS programmer can strive for that prestigious 100% success rate that will help ensure that you find your mistakes before they find you!
The following topics will be discussed in this paper:
“We Don’t Have Time For That QA Stuff!” The Essential
These topics should help spark some ideas for SAS programmers who are looking for ways to further strengthen the quality of their work by integrating a variety of steps within their QA plan. SAS programmers should be able to take this paper home and immediately apply its concepts to their work environments as they strive for a quality product that will please their managers, their clients, and their fellow SAS programmers.
1 SAS Global Forum 2007 Planning, Development and Support “WE DON’T HAVE TIME FOR THAT QA STUFF!” Time is a critical factor in most work projects. There never seem to be enough hours in the day to devote to a specific task as one considers other workload issues as well as their associated costs. In a perfect world, a SAS programmer might have endless hours to put his or her feet up on the desk and carefully review each and every line of code as well as share all of this with their peers for additional review. Unfortunately, this is not a perfect world. SAS programmers are often faced with tight timelines, some of which are last minute emergency requests for data. It may seem that such requests, especially the “straightforward” requests, can bypass much of the QA process, but it is actually those with the tight timelines that need the most review.
There is the potential for too much to go wrong too fast if one is forced to hurry through a job. It can often be challenging to reach a balance between deadlines, costs, and QA, but this is something that all SAS programmers should work to achieve. Two key issues to consider with this are (1) the time needed for modifications, and (2) appreciating the not so simple counts.
Time for modifications – While taking short cuts may seem like a good idea at the time, it is often more time consuming to make modifications far down the road rather than spend the initial QA time from the start.
A programmer may need to dig down deep to identify and resolve a coding issue, and then all of that work will still need to be checked. Invest the time early on and save time in the long run!
Not so simple counts – Some requests may initially appear to be very simple, and the SAS programmer is often more than willing to dive in, pull out the numbers, and present these to the requestor. Exclusion of any QA for this has the potential for disaster if one does not fully understand the request. For example, a request may be received asking for the “number of subjects for study X.” While this request appears to be simple, the SAS programmer may still need to clarify what the definition is here since the requestor may be counting enrollments, number of screened subjects, number of records in the database, etc. If it sounds too simple to be true, it may not be, so it’s always good to double-check and confirm. The worst-case scenario would be that it actually is simple and that the QA effort will merely help give further concrete support. A small bit of effort now should be considered well worth it compared to the potential risk if there was a misunderstanding about the request.
THE ESSENTIAL CHECKLISTFor the senior SAS programmer, a checklist is generally an everyday way of life that is not necessarily done formally, but more through a mental habit process. The process of checking one’s work becomes routine as the SAS programmer has automatic reactions and responses to the various steps of the QA process.
However, beginning SAS programmers (and maybe some old timers as well) just may not be aware of an appropriate process, and while they may try to do the best that they can, a formal checklist can be an extremely valuable tool. Let us review some suggestions for a SAS programming checklist that one may use to ensure that all issues have been resolved prior to the final product being passed off to the client.
Essential components of the checklist include things that (1) you should review, (2) others should review, and (3) should be produced to further support one’s work and prove that you are right. These components are described below.
While these items are just suggestions for what one might need to include as part of a formal checklist, SAS programmers should feel free to adapt this list to their own environment and add, modify, or delete items as needed. Let us branch out further into several of these key areas noted in the checklist as we examine the SAS Log and SAS Output more closely.
DID YOU REALLY LOOK AT THAT SAS LOG?
So the program ran and output was generated – oh yeah, man! We're done now? Wrong. Especially to the beginning programmer and perhaps even the overconfident veteran program, the very fact that output was produced can give one a false sense of security, and the SAS Log has the potential to be taken for granted either by the overconfident senior programmer or the unsuspecting junior programmer. This is by even more the case when something has become "push button" such that it is run repetitively with little or no change over time. The problem comes in when some rare or unexpected scenario occurs and then the whole model just blows up – quietly though so that it appears that everything ran okay and output was successfully generated, but it could potentially have additional issues that need to be addressed. Just because a program ran before, and just because it ran correctly, and just because it has become a routine task doesn’t mean that the SAS Log does not need to be reviewed. A careful review of the SAS Log should include an examination of any errors, warnings, and notes as described below.
ERRORS – A common junior programming mistake is to run a program and be so excited that output has been generated that one fails to even review the SAS Log and identify known ERROR messages that could severely affect the output. Reviewing the SAS Log for ERROR messages should be a standard that all SAS programmers should follow. Just because the program produces output doesn't mean that it was error-free.
And, just because the program produces output that looks right doesn't mean that it IS right. Always confirm that there are no ERRORS in the SAS Log and follow-up on any that are discovered during your review.
WARNINGS – While WARNING messages are not considered as severe as ERROR messages in terms of stopping certain parts of a program, these messages can certainly shed light on potential significant issues.
One should take all WARNING messages seriously. An example of this is the “repeats” of BY variables warning that lets one know that the program is trying to merge by too many common variable attributes.
Unfortunately, the program will continue and decide as best as it can how certain items should be merged, but the vast majority of the time, this will result in improper matching of data.
NOTES – Believe it or not, there are other reasons why one would look at the NOTE messages in the SAS Log besides knowing that SAS took X minutes and seconds to run this DATA step and Y minutes and seconds to run this particular procedure. Additionally, just because all notes appear to be okay, again, this absolutely does not mean that everything IS okay. One might use a certain level of judgment when reviewing these notes with careful attention regardless of the length of the program. Beware of DATA steps that result in "0 observations" as these can cause the program to appear to be running smoothly with even a reasonable output being generated, but without careful review one would be totally oblivious to whether the results are actually accurate or not.
3 SAS Global Forum 2007 Planning, Development and Support
DID YOU REALLY LOOK AT THAT SAS OUTPUT?
As with the SAS Log, a SAS programmer must carefully review all SAS output before passing this off to others. Sure, generating some output is exciting and gives the appearance that everything ran successfully, but how does one know that the output is correct? Each SAS programmer may have his or her own way of reviewing SAS output, but let us discuss a few basic steps that one should always follow including (1) comparing mean categorizations, (2) comparing derived variables to original variables, and (3) comparing present runs to past runs.
0-20 15 1 14 17.0
-50.0 21-40 46 0 46 23.0 40.0 40 36 0 36 40.2 9999.0 ______________________________________________
determine whether or not all of the values fall into their expected ranges for each category that the SAS programmer has set up. By displaying the information in this manner, the SAS programmer or other reviewer can easily go down the table row by row and confirm that each set of minimum and maximum values fall within the derived categories that are noted within the MILE_CAT variable in the far left column.
While specifications may have seemed clear at the start of the programming process, it appears that the programmer may have taken some specifications too literally and included a few values that one had not intended to fall within such ranges. Note that the programmer most likely included all values less than or st equal to 20 for the 1 category and then inadvertently included a negative number (e.g. -50). Additionally, nd the first category has included missing numbers. The 2 category appears to fit nicely with both the minimum and maximum ranges fitting into the 21-40 range. The final category was intended to include values greater than 40, but probably not as high as the value of 9999 that ended up in the category – most likely this was some type of unknown code. Luckily, the generation of this PROC MEANS output alerts the SAS programmer early on in the process and allows for the programmer to make appropriate changes before the data go to the client. For those wanting to further pursue this type of approach, the SAS code (see Figure 2 below) is quite basic and can be accomplished by running PROC MEANS with the categorical variable coded as the CLASS variable and the numeric variable coded as the VAR variable, using the options presented to identify the minimum and maximum values in each category.
Figure 3 on the next page shows some ideal output that one would expect to see to ensure that everything was categorized appropriately. Note that the “problem” data that were discovered in the previous table have been accommodated by creating additional categories. By separating these data more distinctly, the programmer can confirm that the existing data are correct and then forward the problem data to the appropriate data management team members for further review. Obviously this approach will just pick up the outermost values that are outside of an acceptable range, but the approach at least helps one get started in terms of knowing what issues should be researched further.