«CREATING SV AND SE FIRST Henry B. Winsor, WinsorWorks, Limited, San Mateo, CA Mario Widel, Genentech, Inc., South San Francisco, CA ABSTRACT One ...»
PharmaSUG2010 - Paper CD09
CREATING SV AND SE FIRST
Henry B. Winsor, WinsorWorks, Limited, San Mateo, CA
Mario Widel, Genentech, Inc., South San Francisco, CA
One current concern of the FDA is that many sponsors are not submitting both the SV and SE data sets when
submitting studies for review. The Authors address reasons why the data sets are useful other than as a piece of a final submission, even to the point of encouraging the sponsor to create these data sets as soon as clinical data is available. One method for easy creation is demonstrated, along with a verification method.
INTRODUCTIONAmongst the many complaints about current SDTM practices that have been shared by the FDA, the Agency complains that few sponsors are populating the SV and SE data sets appropriately. Indeed, if a sponsor even populates the data sets, this is often done at the last moment before data submission, so the sponsor has no benefit of the use of these data sets during the process of data cleaning and report preparation. We believe this to be a mistake, in that these data sets can be vary useful and should be the first data sets that are populated when clinical data is made available.
It should be noted that both authors are strong believers in the standardization of data structures within a company.
While SDTM is really designed as a submission data source, it is also quite useful during the data cleaning stages.
We are convinced that the rewards of designing and keeping to standardized data structures greatly outweigh the additional costs and time involved in remapping the raw data to fit within the data structures. While SDTM is not an ideal data structure for cleaning and reviewing data, it certainly beats not having a single standard within a company.
So why populate the SV and SE data sets first? A major reason is to avoid circularity in your programs, i.e., where program A creates data for program B to use, which creates the data for Program A to use in creating the data for Program B. We would think that this danger is obvious, but reports from the field indicate that this hazardous technique is still being inflicted upon companies by programmers who cannot otherwise generate data.
DOMAIN ARMCD ARM TAETORD ETCD ELEMENT TABRANCH EPOCHTA CR Controlled 0 PRE Pre-Treatment Pre-Study Release TA CR Controlled 1 TITUP Titration Up Randomized to Up Release CR TA CR Controlled 2 TRT Treatment of Controlled Release Interest
Populating first SV and then SE from SV allows us to use these data sets to populate the VISITNUM, VISIT and VISITDY and EPOCH variables in the other SDTM data sets from one data source without worrying about circularity.
Assigning the VISIT and ELEMENT variables in this fashion allows us to globally modify such things as visit names and epoch names without relying upon the raw clinical data for anything more than a date. This can be a powerful tool, especially when you have to combine data from multiple trials into a single data source.
Additionally, the SV and SE data sets are the only data sources within SDTM that put all the visit related study dates into one data set, allowing you to spot date issues early in the process. Even if your Data Management group does not populate every Case Report Form page with a different date field, you still have enough different collections going on that you need to check these dates for synchronicity. The earlier you do it, the earlier you can get them fixed and not have to deal with conflicting dates later on in the reporting process.
One might argue that it is easier to create the SV and SE data sets last, after all the clinical data has been entered and verified. While this is true when taking into consideration only the SV and SE data sets, this also requires that you create the VISIT and ELEMENT variables in a number of programs, which doesn’t sound at all easier. You also have the task of checking your visit dates for synchronicity without the aid of a program that allows you to do that work early and easily as a byproduct. It should be obvious that creating the SV and SE data sets early in the process of completing your SDTM work can be a real time and effort saver.
CREATING SV Here’s the task at hand. We are going to create the SV data set first, use it to create the SE data set, then use those two data sets to populate the VISIT and ELEMENT variables in the rest of the data sets, including the reference start date and end date in DM. All we need are the TV, TE and TA data sets and the date values found in your raw data.
The first step to creating SV is you need to identify all the potential visit date information in your raw data. You exclude dates like the subject birth date (although some companies consider the birth date to be the first date in the screening period, to each his own), event dates unrelated to visits such as Adverse Event dates and Concomitant Medication dates. The rule of thumb is if isn’t scheduled to be done at a planned visit, we don’t want the date. The easiest way to start this is to take a set of Case Report Forms, preferably annotated, and identify all of the date fields of potential interest and then trace them to the data source. The annotation values are important, because they identify what the Case Report Form designer thinks are the visits and allow us to remap the data as we choose.
Each collection data base will have different names for the variables that contain the annotation values, so you’ll need to be familiar with these in your own system. For example, DLB/ERT databases store two values, one called EVENTID and one called PAGE, the combination of the two allows the user to know exactly on which Case Report Form page the data record of interest appeared.
Other databases will use similar variables and these variables allow us to easily remap data to whatever destination 2 we want. Suppose your Case Report Forms use several EVENTID values to indicate measurements taken during what is really the first visit, we can easily remap all of those different values to one value, Visit One. This ability to remap keeps you from being locked into whatever system Data Management needs to have for their purposes. You have the flexibility of using other visit labeling rules and names, already stored in the TV data set.
After you have identified all of your dates and done the necessary remapping so that you can still identify the data source yet fit the dates within your TV visit structure, you need to set all of the dates so you can check for coherency and synchronicity. Do all of a subject’s Visit One dates roughly coincide with each other? If a visit is supposed to take place on one day, do all the dates agree? Are any different by a year or a month, indicating an entry problem?
You need to build a report at this time, and you are going to have to review it manually for the most part, but any date problems will stand out like an elephants on a putting green.
The report should look something like this:
All that is important is that you can identify the date sources so that they map uniquely into visit intervals. For instance, dates with EVENTID values Baseline and Visit 2 go into the same interval. Also, problem dates will clearly stand out as anomalous entries, so they can be queried. Suppose SUBJID 1026 had a Baseline date of 25JUL2010.
The Baseline EVENTID would sort to be the last entry for the patient, so it would be obvious that there is a date problem. In that case, these type of questionable dates should be removed from the data until it is clear that the date is correct and the patient really did have a visit that took 366 days to complete.
We have said nothing about unscheduled visits up to this point, and that is for a good reason. We are only concerned with scheduled event dates at this time, we will return to unscheduled events later. Do not try to include unscheduled visit days into the process yet, you’ll only be making things hard for yourself.
3 An observer familiar with the variables in SV will note that the above report has pretty much everything we need to start remapping data into the SV data set. If you have already gone over the data, marked questionable values and reported them to Data Management and excluded them from the data set, then we are ready to map our data into SV.
SUBJID becomes USUBJID, EVENTID is mapped into VISITNUM and VISIT, and the Date is mapped into SVSTDTC and SVENDTC.
A couple of things should be mentioned here. Note that the text in Visit no longer matches what was in EVENTID, nor do the VISITNUM values correspond to the visit numbers as before. This is deliberate and shows what you can do with remapping if you do it in the SV data set. You no longer have to put re-labeling code in every data set program, you’ll be able to put it in one program and use the data in a consistent fashion in the other programs. Also, while the variables SVSTDY and SVENDY do not appear here, that is only for space reasons. Relative days are so useful when reviewing data for context issues that they should always be populated. Just remember to use SVSTDTC where VISITNUM = 1 as the reference date in your calculations, as you don’t have a DM data set, nor should you have one. The DM data set will use SV as input only.
A word about SVSTDTC and SVENDTC is appropriate here. Note that we are only using dates, not full datetime variables. This is not accidental, and you should avoid populating these variables with datetime values whenever you can. One, nobody starts tracking a subject when they enter the clinic to start a visit, so any time collected for start and/or end is a bit of a stretch. More importantly, the job of assigning Visits and Epochs to individual data sets later in the process will be greatly simplified. There will be times that you cannot avoid a datetime value, such as when a day is split into several visits for whatever reason by your clinical study designers, but these cases are thankfully quite rare and we are certain that you’ll be able to account for those intraday visits.
UNSCHEDULED VISITSNow it is time to address the unscheduled visits that may have been done by your subjects. If there are none, so much the better, but it’s a rare trial that doesn’t have at least one. Subjects get ill, have elevated lab tests that need to be replicated; the list of potential reasons goes on and on.
For our purposes, the most complicated thing about unscheduled visits is how to map them between the scheduled trial visits. The Case Report Form s are seldom much help, as most DM groups will have a set of generic unscheduled visit Case Report Form s for use with all unscheduled visits, so you can’t use the annotation variables to properly order the visits. But, since you have already done the scheduled visits, you have a structure that you can use to in mapping the unscheduled visits.
We always use integers for scheduled VISITNUM values. This allows us to use decimals to properly insert the unscheduled visits between scheduled visits. First, count the number of unscheduled visits per subject and find the largest number that fall between two given scheduled visits. If it’s less than 10, then you can sequence your unscheduled visits by taking the integer value of the preceding scheduled visit and adding.1 or.2, etc., to the VISITNUM value, so that unscheduled visits fit between or after scheduled visits. The text for the corresponding VISIT variable is simply Unscheduled Visit 0.1, Unscheduled Visit 2.2, etc. Combine all of the unscheduled visit records with the previously created scheduled visit records and you have a complete SV data set.
VISITDY is not populated for unscheduled visits, and the SVUPDES column is omitted. In the case of USUBJID = 1027, we will note that he had extremely elevated liver function tests on his baseline draw. The decision was made th to terminate study drug on the 26 for safety reasons, yet he returned for two additional visits to check on the values st before finally coming in on August 1 for his termination visit assessments. For this patient, you would enter something “Follow-up Safety Lab” in SVUPDES for the unscheduled visits only.
CREATING SE With a completed SV data set, it’s time to work on creating SE. The secret to creating an SE data set is this; you should be determining almost all of your start and stop points for individual SE records from your visits. If you think about it, visits are usually the boundaries between the study periods. This should be a no-brainer if you have picked your study visit day values properly. For instance, when does the screening period end? It ends the day before the first treatment visit. To populate SE, you should be mostly using SV dates.
The one exception is the end of treatment. Most people will assume that treatment end is marked by the date of the termination visit, but this is not necessarily true. You need to think of the date of termination as a separate entity that floats around and does not necessarily anchor a study period. The end of any treatment period is the last day study drug was taken, so map that value from your raw exposure data, and let the termination date fall where it does.
Some people terminate on the last day of treatment, some terminate the next day, some terminate a week later. It’s isn’t a problem unless you want too make it into one by insisting that termination day must be a boundary date.
Here is what our SE data looks for the two sample patients. Note the STUDYID column was dropped for space reasons, and the TAETORD column does not appear for the same reason. This is not a real problem as there is always a 1-1 map between TAETORD and EPOCH. As mentioned before, the SESTDTC values mostly come from the SVSTDTC column, the exception is the start of the Post-Treatment Epoch. The start of this Epoch is always defined as the day after last day of dosing. In the case of USUBJID 1026, that date coincides with the date of the termination visit, but in the case of USUBJID 1027, it clearly does not.