Sorting Data in SAS with Proc Sort

Posted by bkloss | SAS | Sunday 18 January 2009 7:54 pm

The SAS Sort Procedure

Sorting data in SAS is an everyday task accomplished for a variety of reasons.

    1. Certain types of reports require that data be presorted to present results accurately.
    2. Programs that include a where statement can execute more quickly on a sorted data set.
    3. Unlike SQL, SAS requires data sets to be sorted by a common variable before a join operation can occur.

      These examples are just a few situations where a presorted data set is necessary or essential.  Luckily, PROC SORT is a relatively simple procedure.  The Basic Syntax is below:

      Proc Sort data =<data set>;

      By <variable-1 … variable-n>;

      Run ;

      In the code above, <data set> is both the input and output data set of the procedure.  Essentially, SAS replaces your old data set with the new sorted version.  If data=<data set> is not specified, SAS will sort the last data set in use.

      By Statement

      The By statement specifies the variables to use as sort keys.  You may include as many by variables as are contained in the data set.  If you include more than one by variable, SAS will sort the data by the first by variable, then sort the groups of observations produced from the previous sort by the second variable.  SAS automatically sorts data in ascending order, meaning it stores values from A to Z or in the case of numbers, from lowest to highest.  To sort in descending order (from Z to A), specify the keyword Descending before the variable that you would like to sort by.

      This sample code produces the example below:

      Proc Sort data =example;

      By Cost Descending Class;

      Run ;

      Data set sorted ascending and descending by proc sort

      Notice how within each group of ascending cost values, the class values are listed in descending order.  Also note that the missing value for the last row is considered a low value and is placed after C.  SAS treats missing values for character and numeric data as very small values.

      OUT=

      you wanted to retain the original unsorted data while creating a new sorted data set, you can  include the out =<output data set> as shown below:

      Proc Sort data =<original data set> out =<output data set> ;

      By Cost Descending Class;

      Run ;

      This option saves the sorted output data set in a new location and retain the original unsorted data set.

      NODUPKEY

      If you wanted to include only unique values in your output data set.  The NODUPKEY option will eliminate all duplicate values during the sort. Let us return to our example, this time incorporating all that we have learned.  We will sort the data set ‘Old’ by the variables cost and descending class to produce an output data set ‘New’.  The code below produces the example picture:

      Proc Sort data =<original data set> out =<output data set> NODUPKEY ;

      By Cost Descending Class;

      Run ;

      Proc Sort with NODUPKEY option

      Notice how the duplicate observation (43, D) from the previous picture has been removed.  A message was printed to the SAS log indicating that the duplicate record was removed:

      SAS log message

      That wrap things up for the first entry of the SAS tip series.  Please feel free to leave comments if you have anything to add about the Proc Sort procedure.

      No Comments »

      No comments yet.

      RSS feed for comments on this post. TrackBack URI

      Leave a comment