A high-level overview of managing disk space for SAS® data sets and files created by or for the SAS system. Basic housekeeping is covered: keep files that are in-use and backup or discard files that are not in use. Backup methods are discussed, including the important question whether the operating system that your SAS site runs on might change in the future, necessitating use of the special transport format for backup files. SAS procedures that are commonly used for disk file management are described: PROC DELETE, DATASETS, and CATALOG. SQL DELETE and SAS DATA step functions for file management are also discussed. File compression is a very important tool for saving disk space, and the SAS features for this are described. Logical deletion of rows in a data set can waste disk space; prototype SAS code to detect files with this condition is supplied in an appendix. Multiple SAS programming techniques that promote efficient use of disk space are described, as well as suggestions for managing the SAS WORK library.
This presentation discusses a method for reading SAS® logs saved as text files from previously executed SAS jobs and displaying them in the log window of a current SAS session. This allows SAS to color code error, warning, and note messages in saved logs which lose this formatting when saved as text. This enhances the readability of SAS logs for code review. Other methods for saving logs with color will be briefly discussed, as well as the advantages and disadvantages of those methods. The SAS log is an important tool for ensuring the quality of analytical work in SAS and color coding helps to highlight information provided about SAS code execution.
When confronted with a new data channel, the modern data scientist or analyst will employ sophisticated data visualization tools like Visual Analytics to size up the data. Not all users have access to these tools and must rely on more pedestrian code-based approaches. This paper explores techniques using Base SAS to provide data-driven data analysis to help size up data absent the more modern tools. These techniques leverage the details about data available from the dictionary subsystem. Knowing the names, formats, and data types of the data allows one to derive great insight into the content of the data stream.
Hunter Glanz and Josh Horstman
No matter the industry or field of study, missing values inevitably complicate a great many datasets and analyses. Deciding what to do with them in an analysis often depends on the analysis itself. Regardless of the uses and analyses, the data need to be clean and tidy first. Tidy data include, among other things, properly coded missing values. Unfortunately, domain-specific software, electronic surveys, and other data collection and organization tools do not necessarily code missing values in the ways that SAS wants: a period for numeric values and a blank for character values. SAS accepts a short list of other special characters as missing values also, but these aren't guaranteed any more than a period or blank. To make matters worse, non-numeric flags such as "NA" can cause SAS to read our numeric data into a character variable, rendering the information difficult to analyze. Most SAS users would not complain about re-coding missing values as long as the variable types were not altered by the employed missing value code, but this is not always the case. In this article we provide a robust macro for re-coding a given missing value code, to the period and blank that SAS prefer, while also converting variables back to their intended types.
The SAS® FORMAT procedure “creates user-specified formats and informats for variables.” In other words, FORMAT defines data models that transform (and sometimes bin) prescribed values (or value ranges, in the case of numeric data) into new values. SAS formats facilitate multiple objectives of data governance, including data cleaning, the identification of outliers or new values, entity resolution, and data visualization, and can even be used to query or join lookup tables. SAS formats are often hardcoded into SAS software, but where data models are fluid, formats are best defined within control files outside of software. This modularity—the separation of data models from the programs that utilize them—allows SAS developers to build and maintain SAS software independently while domain subject matter experts (SMEs) separately build and maintain the underlying data models. Independent data models also facilitate master data management (MDM) and software interoperability, allowing a data model to be maintained as a single instance, albeit implemented not only with SAS but also Python, R, or other languages or applications. The CNTLIN option (within the SAS FORMAT procedure) facilitates this modularity by creating SAS formats from data sets. This text introduces the BUILD_FORMAT macro that greatly expands the utility of CNTLIN, allowing it to build formats not only from one-to-one and many-to-one format mappings but also from multitiered, hierarchical data models that are built and maintained externally in XML files. The numerous advantages of BUILD_FORMAT are demonstrated through successive SAS code examples that rely on the taxonomy of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).
Did you ever wish you could use the power of SAS® to take control of EXCEL and make EXCEL do what you wanted “WHEN YOU WANTED”? Well one letter is the key to doing just that, the letter “X” as in the SAS “X” Command that opens the door to all operating system commands from SAS. The Windows operating system comes with a facility to write a series of commands called scripts. These scripts have the ability to open and reach into the internals of EXCEL. Scripts can load, execute and remove VBA macro code and control EXCEL. This level of control allows you to make EXCEL do what you want, without leaving any traces of a macro behind. This is Power.