Health information exchanges (HIEs) are platforms that enable the sharing of patient health information (PHI) among healthcare providers. HIEs offer many benefits; such as lower cost, faster services, and better health outcome; to both patients and providers. However, most HIEs have a rigid consent mechanism that provides access to all participating providers to PHI for all consenting patients. This research investigates the impact of granting greater control to patients in sharing their personal health information on consent rates and making them active participants in the HIEs system. This research utilizes a randomized experimental survey design study. The study uses responses from 388 participants and applies confirmatory factor analysis (CFA) and structural equation modeling (SEM) using SAS 9.4 to test the conceptual model (i.e. proc calis and proc factor). The main findings of this research include that patients consent rate increases significantly when greater control in sharing PHI is offered to the patient. In addition, greater control reduces the negative impact of privacy concern on the intention to consent.
CDISC Define-XML 2.0.0 was officially released in March 2013 by the CDISC-XML Technologies team to describe CDISC SDTM, SEND and ADaM datasets for the purpose of submissions to the FDA. This version presents major changes from the previous version 1.0.0 released in February 2005. The new version runs on the latest ODM v1.3.2. (Operational Data Model) which is vendor neutral, platform independent format for the interchange and archival of clinical study data. Some of the key changes in this version include extended Controlled Terminology (CT), detailed Value Level Metadata (VLM) description allowing subsets of data with a where-clause, explicit linking to external documents and allowing more clarity to the specifications via Comments.
The paper focuses on elaborating the different variable data types that were introduced in v2.0.0 and utilize the guidelines to build an approach when creating Define-XML, especially those that are excel- driven in the back end. The attributes – data type, length, format, and significant digits of a variable are the key pieces that will be discussed in following sections. The discussion narrows down to a SAS® macro that is able to generate them as per the new guidelines.
Samuel Berestizhevsky and Tanya Kolosova
Experience and good judgment are essential attributes of underwriters. Systematic analysis of the underwriting decision-making process was expected to increase the efficiency in developing these attributes in underwriters. However, this approach fell short of expectations. The industry still struggles with the pace of knowledge and experience delivery to next generation of underwriters. The solution may lay in development and deployment of artificial intelligence (AI) methods and algorithms in automation of the underwriting decision-making process. This paper outlines the current state of the performance measurement of underwriting decision-making process through underwriters’ performance metrics (including a novel one). Further, this paper provides an in-depth description of AI methods and algorithms and their implementation in SAS that can be successfully used to automate underwriting decision-making process. The real data from one of the leading insurance companies was used for analysis and testing of proposed approaches.
Rachel Perry, Laura Erhart and Shane Brady
Using a lookup table to add new information efficiently to a primary dataset is a powerful technique. This paper explores how a lookup table was utilized with base SAS® software to match laboratory reports of influenza in an Arizona infectious disease database, the Medical Electronic Disease Surveillance Intelligence System (MEDSIS), to a list of test and result combinations in order to assign standard categories within the database. Reports received in MEDSIS via electronic laboratory reporting (ELR) include the test name, test result, and possibly additional information in a notes field. For ease of analysis, MEDSIS contains fields for categorizing this information into standard options for type of test performed and result, but traditionally required manual data entry for each field. We created a lookup table using the information received from ELR to classify each lab report into standardized options.
Reported cases within the database are matched to the lookup table routinely, allowing for an efficient method to assign the test type and result for each report. The categorized information is fed into MEDSIS to automatically populate these variables for each case record. Because reports received may vary over time or between laboratories, we need to be able to dynamically update the lookup table. This program compares the lookup table to the dataset used for analysis and combinations of test names and results not found in the lookup table are exported for review. Ultimately, this process has saved hours of time by eliminating much of the manual categorization, resulting in an efficient way to update MEDSIS.
Grisell Diaz-Ramirez, Siqi Gan, Sei Lee, Alexander Smith and John Boscardin
Like other regression methods in SAS®, the PHREG procedure has built-in options to perform predictor selection using stepwise methods or best subsets. However, these built-in options for predictor selection do not allow the simultaneous selection of predictors for multiple outcomes. The selection of a common set of predictors for multiple outcomes is important in clinical settings where practitioners are frequently interested in predicting multiple outcomes in the same subject, while at the same time obtaining a parsimonious model with appropriate predictive accuracy. In this paper we describe a SAS Macro for selecting a common set of variables for predicting multiple outcomes. The selection method uses a variant of backward elimination based on the average normalized Bayesian Information Criterion (BIC) across multiple outcomes. The BICs are obtained by fitting multivariable survival models using PHREG in SAS version 9.4, SAS/STAT 14.2. We illustrate the proposed method using the Health and Retirement Study data. We compare the predictive accuracy and parsimony of the final model with the models obtained for each individual outcome. We then test the correct inclusion and correct exclusion of variables in the final model using a simulation study. Our method provides a straightforward approach to obtain a common set of predictors for multiple clinical outcomes without compromising parsimony or predictive accuracy.
Lissa Bayang and Wayne Leonetti
Obtaining the latest data in healthcare analytics is of utmost importance. Providers and management rely on the newest data obtained from medical records to help drive their daily goals and progress. In many healthcare organizations, SAS® programs are run daily to produce the latest data. However, various issues can prevent the upload of the most current data. Some of these issues include delays from the database, high session volumes due to many users accessing the database, and other program failures. It can be problematic for SAS programs to run without any way of checking if the current data set is available to be consumed. This produces unreliable data for stakeholders who need it. To mitigate this issue, a dependency macro is put in place to check if a data set is present with the current date’s timestamp. If the available file does not meet the requirement, SAS processing is delayed until a dataset with the current date’s timestamp is present. In addition to checking the current timestamp, other options are included in this macro to make it more useful such as the ability to check intermittently up to a certain time and the ability to notify the user via email if the SAS program successfully ran or failed. Options in this macro are flexible and easy to adjust which allows it to be used in different situations. This paper is intended for intermediate and advanced SAS users and the program is used in SAS 9.4.
Alec Zhixiao Lin and Xiao Hu
Mortgage loan prepayment is of constant interest to both academia and practitioners. A considerable drop in the market interest rates will trigger a wave of early payoffs and cause losses to investors in mortgage-related derivatives. This paper suggests the use of net lift measure to estimate the effects of changes in interest rates on underlying mortgage prepayment. The loan-level modeling method will help investors with more accurate valuations of mortgage-related assets. Insights from such a study are also useful to banks owning mortgage portfolios for developing retention strategies in the awake of a mortgage rate drop.
Amy Alabaster and Mary Anne Armstrong
Analysts of electronic medical record data take advantage of structured data fields used for patient tracking and claims purposes. In the absence of codes, analysts can use unstructured text data, such as that found in doctors' notes. Though notes are rich in information, they are also full of inconsistencies –cryptic abbreviations, typos, and immense provider variation – that often make structured analysis using basic string functions inadequate. Alternatively, Perl regular expressions can be used to tackle many text problems in health research. This paper reviews the basics of regular expressions and their implementation in SAS using the functions and call routines: PRXPARSE, CALL PRXSUBSTR, PRXMATCH, CALL PRXNEXT, PRXPOSN, and PRXCHANGE. To illustrate how these functions are applied in the health research setting, a post-marketing pharmaceutical study is discussed- in which, the FDA asked a team of research partners to show how un-coded intrauterine device (IUD) outcomes could be found in unstructured health record data. Even with a powerful tool like regular expressions, a balance must be achieved between the risk of false positives and false negatives. For this purpose, a random sample of 100 charts was closely reviewed by a clinical expert for data accuracy. For the primary outcome, uterine perforation following IUD insertion, 77 of 100 events found using text searching capabilities were confirmed to be true uterine perforations. While advanced text mining tools are available with other platforms, and even now with SAS, a basic understanding of regular expressions and the PRX functions are sufficient to achieve efficient and valid results.
Parmodh Sharma and Ankit Bansal
Periodical data review is very important and highly recommended for all the ongoing clinical studies to ensure the data integrity and quality. Each clinical study requires experts from various functional groups like SAS programming, Biostatistics, Data Management and so on. Each one of them have various data review requirements and one cannot expect everyone to familiar with SAS programming as clinical datasets are often available as SAS datasets. Statisticians prefer summary level data whereas others might need to look at the summary level as well as granular level data. These reports are static and hence end users do not have any choice to customize or drill down the reports on their own. Currently it is always directed to a SAS programmer to update the reports which is an overall time consuming process.
Beginning Dec 18, 2016, all clinical trial and nonclinical trial studies must use standards (e.g., CDISC) for submission data and beginning May 5, 2017, NDA, ANDA, and BLA submissions must follow eCTD format for submission documents.
In order to enforce these standards mandates, the FDA also released ""Technical Rejection Criteria for Study Data"" in FDA eCTD website on October 3, 2016. FDA also implemented a rejection process for submissions that do not conform to the required study data standards.
The paper will discuss how these new FDA mandates impact the electronic submission and the required preparation for CDISC and eCTD complaint submission package such as SDTM, ADaM, Define.xml, SDTM annotated eCRF, SDRG, ADRG and SAS® programs. The paper will introduce the current FDA submission process, including the current FDA rejection processes – “Technical Rejection” and “Refuse-to-File” and discuss how FDA uses “Technical Rejection” and “Refuse-to-File” to reject submission. The paper will show how FDA rejection of CDISC non-compliant data will impact sponsor’s submission process, and how sponsors should respond to FDA rejections as well as questions throughout the whole submission process. Use cases will demonstrate the key technical rejection criteria that will have the greatest impact on a successful submission process.
Alec Zhixiao Lin
As an important variable for understanding mortgage prepayment, the burnout factor has two different usages. In pool level analysis it captures how the heterogeneity in borrowers impacts the slowdown in prepayment rates during a cycle of interest rate drop. The term is also employed in behavioral finance to measure borrowers’ incentive for refinancing. Using national data published by Freddie Mac, our regression analysis shows that dissecting the burnout factor in terms of incentive and eligibility will make it a good predictor for mortgage refinancing. The two usages are equally valuable and can complement each other for modeling mortgage refinancing and for the valuation of mortgage-backed security (MBS).
venkata Madhira and Prabhakara Burma
As per the CDISC guidelines the clinical trials data must be submitted in SAS Version 5 Transport file format. These transport files should be validated using P21 validator to ensure CDISC compliance and all errors given by P21 validator to be addressed by the programming team. One of the most common P21 error messages is FDAC036 (Variable length is too long for actual data). Usually for most of the character variables we typically assign some default length while creating SDTM and ADaM datasets. But if variable length is greater than its maximum value length P21 validator throws out an error message (FDAC036). The best solution to address FDAC036 error message is, assigning the variable maximum value length as the variable actual length. It is tedious and cumbersome process for a programmer to look for each character variable whose assigned length is not same as maximum value length and assign maximum value length for a variable as actual length. In order to accomplish this complicated task a macro (ADJLEN) is created. This macro identifies each character variable whose length is not same as maximum value length and reassigns the length for the variable to maximum value length. The advantages of using the macro ADJLEN are reduces the size of the dataset and avoids P21 FDAC036 error Message for the submission purpose.
Lingjiao Qi and Bharath Donthi
When presenting descriptive and summary statistics in clinical trials and the healthcare industry, formatting of summary tables and supporting listings are critical for data review. Well-formatted outputs greatly enhance readability and reduce review time, which also help draw attention to any significant information imbedded among thousands of pages. However, it’s often a challenge for programmers to align different data types/formats when combining parameters. Here, we discuss several techniques to increase output readability by aligning both character and numeric data in summary tables and supporting listings. The first method is to use the option ASIS = ON, which reserves leading spaces of data. The second method is to use the escape character function NBSPACE, which inserts, holds, and prints leading and trailing blank spaces for outputting. Last but not least, we introduce an in-house developed macro (%decimal) which evaluates the data and makes numeric data dynamically align by decimal point. The %decimal macro also includes options to set customized exclusions (eg: option to selectively align decimal points for specific tests, option to avoid alignment when the max number of decimal digits exceeds a certain number). High or low flags or other indicators can also be aligned separately for better visibility.
Bharath Donthi and Lingjiao Qi
In clinical trial studies, statistical programming activities often start when data collection is still ongoing. It’s a common situation that we have variables or datasets with partial or completely missing data. Below we discuss defensive programming techniques which avoid re-programming when the database is updated. For missing SDTM supplemental qualifiers or variables, we discuss methods to defensively create these variables for continuing ADaM and table/listing programming. One method is to create a dummy SAS dataset with all needed variables with null values in a separate data step, then set it together with the current dataset. The other method is to create required variables using array techniques before manipulating SDTM/SUPP datasets. Additionally, we also present a robust programming technique to dynamically generate table/listing outputs handling situations where the current dataset does not have the required data for table/listing outputs. Implementing these useful defensive programming techniques minimize the risk of errors when processing ongoing clinical trial data, allow programming to start earlier in the life of a study, and reduce the need for rework as data change.
Bhargav Koduru and Balavenkata Pitchuka
Sensitivity analyses determine how the different values of an independent variable impact a particular dependent variable under a given set of conditions. Sensitivity analyses are becoming an integral component of Phase III clinical trials, as demonstrated by the latest draft (June 2017) of the Addendum on “Estimands and Sensitivity Analysis” to be incorporated into ICH’s ""E9 Statistical Principles for Clinical Trials"". These analyses help to explore the impact of missing data, and deviations by statistical models on trial results.
The following is an instance of how the potential bias is addressed by sensitivity analysis. In the case of inclusion of objective progression events without documentation of lesion measurements, the sensitivity analysis would consider only objective progression with documentation and deaths as progression free survival events while backdating objective progression events to the previous complete assessment in the event of missing or incomplete assessments.
The above examples and other sensitivity analyses sometimes involve the creation of data points (by LOCF, WOCF) at visits in the ADaM datasets, which do not exist in the SDTM datasets (e.g., when the visit was skipped or not performed). In this paper, we would like to present how the needs of each sensitivity analysis are addressed, sometimes by creation of new records that do not exist using TV dataset in Response (ADRS) and PRO related ADaM datasets.
Key words: Sensitivity Analysis, Progressive Disease (PD), Imputation, Analysis Data Model (ADaM), Last Observation Carried Forward (LOCF), Worst Observation Carried Forward (WOCF), Trial Visits (TV), Patient Related Outcomes (PRO)"
Emily Woolley and Amber Randall
Implementation of CDISC standards for CSR deliverables can be complex and costly for Phase I/II studies. The decisions of if and when to implement standards are based on a number of factors including breadth of the development program, potential partnerships, and unknown asset viability. We will discuss the ideal CDISC-compliant process flow - engineered from the final statistical deliverables backwards to the initial planning for data collection. Alternate approaches can be taken to balance time, scope, and effort depending on where in the process CDISC standards are adopted. Case studies illustrate considerations and strategies based on lessons learned from implementing CDISC compliance at different time points.
Venkata Madhira and Harish Yeluguri
All submission datasets must comply with CDISC guidelines. One of the challenging tasks in following CDISC guidelines is variable text string in a submission dataset should not exceed 200 characters. If this scenario occurs in general observation class domains, the first 200 characters are stored in parent domain variable and other part of the text to be stored in supplemental qualifiers dataset as per CDISC standards. In case of TSVAL and COVAL the first 200 characters are stored in the parent domain variable, and rest of the text to be stored in additional variables (eg: COVAL1, COVAL2…) with each new variable text length less than or equal to 200.
It is most common in clinical trials, collecting data from subjects and/or trial summary parameter value (TSVAL) exceeds 200 characters. In fact, it is tedious and cumbersome process for a programmer to search each dataset in a library for the variables with text value greater than 200 characters. If so, split them into additional variables without breaking the word, in a readable manner and complying CDISC standards. This complicated task can be done swiftly using the macro tool FINDSPLIT (by just passing two parameters) given in this paper.