Kirk Paul Lafler
Data-driven programming, or data oriented programming (DOP), is a specific programming paradigm where the data, or data structures, itself controls the flow of a program and not the program logic. Often, data-driven programming approaches are applied in organizations with structured data for filtering, aggregating, transforming and calling other programs. Topics include how SAS® users can access metadata content to capture valuable information about the librefs that are currently assigned, the names of the tables available in a libref, whether a data set is empty, how many observations are in a data set, how many character versus numeric variables are in a data set, a variable’s attributes, the names of variables associated with simple and composite indexes, access the content of read-only SAS metadata data sets called DICTIONARY tables or their counterparts, SASHELP views, and how SAS metadata can be dynamically created using data-driven programming techniques.
Lisa Mendez and Andrew Kuligowski
This case study provides a real-world example of how Base SAS was used to read in over 985 Excel workbooks to check the structure of over 90,000 worksheets – and to repeat the process quarterly. It will illustrate how components such as the LIBNAME XLSX Engine, PROC SQL (to create macro variables), SAS Dictionary Tables, and SAS Macros were used together to create exception reports exported to MS Excel workbooks. The structure of the worksheets, such as worksheet names and variable names, were checked against pre-loaded templates. Values within the worksheets were also checked for missing and invalid data, percent differences of numeric data, and ensuring specific observations were included in specific worksheets. This case study describes the process from its inception to the ongoing enhancements and modifications. Follow along and see how each challenge of the process was undertaken and how other SAS User Group conference proceeding papers contributed to this quality check process.
This paper explores the creation of complex maps with SAS® software. This presentation incorporates explores the wide range of possibilities provided by SAS/GRAPH and polygon plots in the SG procedures, as well as replays, overlays in both SAS/GRAPH and SG procedures, and annotations including Zip Code level processing. The more recent GfK maps now provided by SAS, that underlie newer SAS products such as Visual Analytics as well as traditional SAS products, will be discussed. The pre-production SGMAP procedure released with Version 9.4 Maintenance release 5 will be discussed in context.
In banking industry, the tremendous large data files and the file numbers from different business models are keeping updating every day; all variables, their values, and the number of variables and the number observations are keep changing; business requirements are keeping changing too. It causes lots of headaches on how to handle these changes. To use the SAS code generation method and automatically cover these arbitrary changes in SAS engines is a perfect option because the automatic code generation method not only can automatically cover all changes but also can avoid developing lots of codes and save a lot of developing time, and it makes the SAS engines readable, easy to understand, and easy to be handled and debugged too.
Its advantages are clear:
1) It can automatically handle the random changes of the number of variables and their values much more efficiently than traditional SAS engines, which require manual intervention to change or modify the code;
2)It can make very complex and time-consuming coding issues to become very simple and easy;
3) Therefore it dramatically decreases the development, modification time for SAS engines when updated requirements appear.
When dynamically controlling an application or process through the use of SAS® macros, it is often advantageous to utilize a combination of metadata control files and list processing techniques. Typically the control file is read once into a macro variable list. But what if your information source is a large table and instead of accessing it sequentially you need to first look up specific information based on some supplied criteria? It would be very useful to be able to load the table into memory once and then only access those portions needed at a given point in time. It would be even better if you could do this without creating any macro variable lists. Learn how this can be done using a combination of hash tables, FCMP functions, and the macro language to create memory resident lookup tables.
The SAS® DATA step is one of the best (if not the best) data manipulators in the programming world. One of the areas that gives the DATA step its flexibility and power is the wealth of functions that are available to it. With over 450 functions in the DATA step it is difficult to learn and remember them all, however understanding how to take advantage of the power of these functions is key to taking full advantage of the DATA step.
This paper takes a PEEK at some of those functions whose names have more than one MEANing. While the subject matter is very serious, the material will be presented in a way that is guaranteed not to BOR the audience. Syntax will be discussed and examples of how these functions can be used to manipulate data will be demonstrated. With so many functions available and with less than an HOUR to present, we obviously will have to TRIM our list so that the presentation will fit within the allotted TIME.
The Macro quoting functions is one of the most difficult concepts to grasp when one learns to write a macro program. One of the reasons for its complexity is closely related to understanding the macro processing, such as macro compilation and execution phases. Furthermore, the situations for utilizing macro quoting functions are not obvious for programmers. In this talk, in addition to reviewing the macro processing, we will examine the most useful macro quoting functions via various examples.
SAS(R) software consists of two languages, SAS and its macro language.SAS(R) software consists of two languages, SAS and its macro language. The purpose of this question-and-answer session is to provide overview and perspective of how SAS works and how the macro language can work both within and before SAS program statements and steps.
A statement in a natural language contains three parts: subject, verb and object. A statement in a computer language contains only two parts: verb and an object; the subject, or actor, is the computer’s operating system. The predecessor of the meme list processing is the computer language LISP, in which every statement is a function call and the object is a list. The list processing paradigm of programming contains these steps: 1. identify an object, an item; 2. write a function, process, or procedure for an item; 3. prepare a list of items; 4. use a loop on the list, to process each item. This paper reviews the author’s development of the concept of list processing and its implementation in SAS(R) software.
The purpose of this exposition is to highlight the author’s papers published on these topics and to provide a critique of earlier ideas. This is accomplished by a review of the development of processes for calculating the cardinality ratios and cardinality types of the variables in a data set.
Few things are more frustrating that kicking off a long running SAS program, only to return hours later and discover the program encountered an error early in processing. One of those even-more-frustrating things is digging through the logs to determine the state of the system upon failure. Which of the 100s of calls to %VeryHelpfulMacro() failed? What were the parameters and state of temporary tables at that time? This paper describes one approach to monitoring execution of macros, detecting errors, and creating a call stack to determine where and when runtime errors occurred. Adopting these simple techniques can reduce debugging time, promote reuse, and reduce project develop time.
Even a rudimentary knowledge of SAS® macro will highlights one glaring limitation, macros only accept text input parameters and have no output parameters. How can a SAS programmer develop robust and reusable programs with strong encapsulation and abstraction without the ability to return complex data types (i.e. datasets)? Too often the solution involves compromising encapsulation (i.e. referencing global variables) which severely limits reusability by invariably producing runtime errors and the need for modification. But there a better way. Two programming techniques, data validation and passing parameters by reference, enable the SAS programmer to create robust and reusable macros with strong encapsulation. This study will demonstrate how these techniques, with minimal additional code, can dramatically increase the flexibility and usability of macros.
This paper will showcase the number crunching power of SAS® 9.4. It's back to basics, with an updated twist, as we will explore the power of the CAMERA interface coupled with SAS® PROCS. Building files read into SAS® 9.4, this discussion seeks to uncover broader technological trends by integrating visuals, with the metrics power of SAS® 9.4. Our objective for this paper is to AUTOMATE the process of visualization for cell growth in a way that inspires us.
After teaching at SAS for over 10 years to thousands of learners, this instructor has collected many best practices from helping customers with real-world business problems. Hear all about her confessions on making life easy with mnemonics to recall the order of statements in SQL. Learn about the data step diehard user who now loves SQL thanks to this little known secret gem in PROC SQL. Hear about the ways in which ANSI SQL falls short and PROC SQL picks up the slack. In short, there are many confessions and so little time. session is open to all interested in improving their SQL knowledge and performance.
The purpose of this paper is to determine if DDE or Proc Import are better methods of importing data from spreadsheets. We had presented a paper on DDE at a SAS western conference a number of years ago. This paper is somewhat of an update and comparison between DDE and Proc Import. The Arizona Supreme Court, Administrative Office of the Courts (AOC) receives thousands of reports on spreadsheets every year. The monthly statistical reports are summarized by court and published every year in a fiscal year-end publication called the Data Report. The data is also summarized to enable a broad picture of the judiciary in Arizona and used to answer many “what if” questions looking at the future of the judiciary, such as proposed legislation. The process of printing and entering these data into our SAS AF data entry screens can be very tedious and labor intensive. The AOC has saved hundreds of work hours and thousands of dollars by using SAS programs to import data from spreadsheets using DDE and Proc Import.
Varaprasad Ilapogu and Janet Li
Proc compare is widely used in the pharmaceutical world to validate almost every statistical output. The procedure compares and shows the differences between the contents of two SAS data sets in the ‘results window’, however, these differences appear as discrete, variable-by-variable differences. Oftentimes, programmers find themselves spending time opening base or compare datasets to locate the observations where the values are different. For large datasets, this can be a time-consuming process. This paper discusses the various options in PROC COMPARE that can make the validation process more efficient. Using the out= option in PROC COMPARE along with other options addressed in the paper will allow statistical programmers to produce an output dataset that not only provides the differences in the values of the datasets but also provides a visual reference to other matching variable values. This increases the efficiency of finding the source of the differences and hence resolving these differences.
Data quality is a critical component of data governance and describes the accuracy, validity, completeness, and consistency of data. Data accuracy can be difficult to assess, as it requires a comparison of data to the real-world constructs being abstracted. But other characteristics of data quality can be readily assessed when provided a clear expectation of data elements, records, fields, tables, and their respective relationships. Data dictionaries represent a common method to enumerate these expectations and help answer the question What should my data look like? Too often, however, data dictionaries are conceptualized as static artifacts that only describe data. This text introduces dynamic data dictionaries that instead prescribe business rules against which SAS® data sets are automatically assessed, and from which dynamic, data-driven, color-coded exception reports are automatically generated. Dynamic data dictionaries—operationalized within Excel workbooks—allow data stewards to set and modify data standards without having to alter the underlying software that interprets and applies business rules. Moreover, this modularity—the extraction of the data model and business rules from the underlying code—flexibly facilitates reuse of this SAS macro-based solution to support endless data quality objectives.
SAS programming professionals are often asked to receive data files from external sources and analyze them using SAS. Such data files could be received from a different group within one’s own organization, from a client firm, from a Federal agency, or from some other collaborating establishment. Whatever the source, there is likely to be an agreement on the number of variables, variable types, variable lengths, range of values, and variable formats in the incoming data files. Information Technology best practices require that the receiving party perform quality checks (QC’s) on the data to verify that they conform to the agreed-upon standards before the data are used in analysis.
This paper presents a rigorous methodology for validating user-submitted data sets using Base SAS. Readers can use this methodology and the SAS code examples to set up their own data QC regimen.
The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than IBM’s SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data.
These days, PROC SORT has fallen out of favor; after all, PROC SQL allows merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: is the SORT procedure still relevant to any other than the SAS novice, or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear, “yes.” PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper will discuss those enhancements.
SAS Functions provide amazing power to your DATA step programming. Some of these functions are essential—some of them save you writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions may be new to you and they will change the way you program and approach common programming tasks.
The majority of the functions described in this talk work with character data. There are functions that search for strings, others that can find and replace strings or join strings together. Still others that can measure the spelling distance between two strings (useful for "fuzzy" matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest of smallest value? A knowledge of the functions described here will make you a much better SAS programmer.
DS2 is a SAS® proprietary programming language appropriate for advanced data manipulation. In this paper, we explore advantages of using PROC DS2 procedure over the data step programming in SAS®.
We explore various programming techniques within PROC DS2 to resolve real world SAS programming problems such as use of hash objects for performance improvement, support for calls to FCMP user defined functions and subroutines within DS2. We explore executing queries in-databases using FED SQL and explore use of embedded FedSQL to generate queries in run-time to exchange data interactively between DS2 and supported database. This allows processing data from multiple tables in different databases within the same query thereby drastically reducing processing times and improving performance. We explore use of DS2 for creating, bulk loading, manipulating and querying tables in an efficient manner.
We will compare and contrast traditional SAS vs. DS2 programming techniques and methodologies. How certain programming tasks can be accomplished using PROC DS2 at the cost of slightly higher added complexity to code but with huge performance benefits? When does it make most sense to use of DS2 and perform performance benchmarking between using traditional programming techniques vs. PROC DS2 to perform statistically intensive calculations in SAS.