Using SAS to Read Disk directories from Windows Systems
It can be useful on occasion to be able to look in some detail at the files we accumulate on the hard disks of our PC. For instance:
  • to identify the largest files, the ones most likely to be causing disk space problems
  • to identify true duplicate files that may be scattered across one or more disk volumes
  • to identify which directories occupy the most space

It is not easy to see these facts using Windows Explorer.

Here are details of a SAS program that will read directory listings from Windows 95, 98 and NT (I haven't yet seen 2000 or ME) and produce, at present, very simple reports. Once one has the data, reports can be generated according to need, anyway. The reports use ODS, but the programs will work with Version 6 of SAS software if the ODS statements are removed.
The program, as supplied, requires a directory called 'Diskstats', with subdirectories 'data', 'data sets', 'outputs' and 'programs'. These can all be changed by modifying the few lines at the start of the code.
The program reads from a text file generated using the DOS command 'DIR' (from the MS-DOS Prompt[Windows 95]/Command Prompt [Windows NT] in Start Menu>Programs) as follows:
DIR c:\*.* /s > d:\diskstats\data\sample.txt
The name of the data file needs to be changed within the code according to the name you assign it. The drive letters in the DIR command will need to be changed according to your own PC setup.
The program has not been designed as an example of good programming style. Rather, it has been designed as an exercise for students, to show a number of very useful capabilities and features of SAS in use.
In reading the input data, the program first creates a version of the data as a SAS data set that is very large, and this requires an appropriate amount of disk capacity - i.e. large, perhaps 30-40 megabytes. This is done on purpose, to demonstrate how one can easily produce unnecessarily large data sets. The data set is then deleted in favour of a version created from it that is, typically, only 10 per cent of the size. This is done to show how data sets can be made a more reasonable size while delivering the identical data.
The program also creates some SAS code and writes it to a file and then %INCLUDEs it.
The final data set contains details of each file and directory for the disk volume being processed.
The program that reads the directory data
Macros to report on the data
Sample data
Comments?

Last updated on 31 January 2001. Copyright © 1997-2001 VIEWS-UK Webmaster@views-uk.org