StatPac for Windows User's Guide
StatPac Home
 

Overview

System Requirements and Installation

System Requirements

Installation

Unregistering & Removing the Software from a PC

Network Operation

Updating to a More Recent Version

Backing-Up a Study

Processing Time

Server Demands and Security

Technical Support

Notice of Liability

Paper & Pencil and CATI Survey Process

Internet Survey Process

Basic File Types

Codebooks (.cod)

Data Manager Forms (.frm)

Data Files (.dat)

Internet Response Files (.asc or .txt)

Email Address Lists (.lst or .txt)

Email Logs (.log)

Rich Text Files (.rtf)

HTML Files (.htm)

Perl Script (.pl)

Password Files (.text)

Exported Data Files (.txt and .csv and .mdb)

Email Body Files (.txt or .htm)

Sample File Naming Scheme for a Survey

Customizing the Package

Problem Recognition and Definition

Creating the Research Design

Methods of Research

Sampling

Data Collection

Reporting the Results

Validity

Reliability

Systematic and Random Error

Formulating Hypotheses from Research Questions

Type I and Type II Errors

Types of Data

Significance

One-Tailed and Two-Tailed Tests

Procedure for Significance Testing

Bonferroni's Theorem

Central Tendency

Variability

Standard Error of the Mean

Inferences with Small Sample Sizes

Degrees of Freedom

Components of a Study Design

Elements of a Variable

Variable Format

Variable Name

Variable Label

Value Labels

Valid Codes

Skip Codes for Branching

Data Entry Control Parameters

Missing OK

Auto Advance

Caps Only

Codebook Tools

The Grid

Codebook Libraries

Duplicating Variables

Insert & Delete Variables

Move Variables

Starting Columns

Print a Codebook

Variable Detail Window

Codebook Creation Process

Method 1 - Create a Codebook from Scratch

Method 2 – Create a Codebook from a Word-Processed Document

Spell Check a Codebook

Multiple Response Variables

Missing Data

Changing Information in a Codebook

Overview

Data Input Fields

Form Naming Conventions

Form Creation Process

Using the Codebook to Create a Form

Using a Word-Processed Document to Create a Form

Variable Text Formatting

Field Placement

Value Labels

Variable Separation

Variable Label Indent

Value Labels Indent

Space between Columns

Valid Codes

Skip Codes

Variable Numbers

Variable List and Detail Windows

Data Input Settings

Select a Specific Variable

Finding Text in the Form

Replacing Text in the Form

Saving the Codebook or Workspace

Overview

Keyboard And Mouse Functions

Create A New Data File

Edit Or Add To An Existing Data File

Select A Different Data File

Change Fields

Change Records

Enter A New Data Record

View Data For A Specified Record Number

Find Records That Contain Specified Data

Duplicate A Field From The Previous Record

Delete A Record

Data Input Settings

Compact Data File

Double Entry Verification

Print A Data Record

Variable List & Detail Windows

Data File Format

Overview

HTML Email Surveys

Plain Text Email Surveys

Brackets

Item Numbering

Codebook Design for a Plain Text Email Survey

Capturing a Respondent's Email Address

Filtering Email to a Mailbox

General Considerations for Plain Text Email

Overview

Internet Survey Process

Server Setup

Create the HTML Survey Pages

Upload the Files to the Web server

Test the survey

Download and import the test data

Delete the test data from the server

Conduct the survey

Download and import the data

Display a survey closed message

Server Setup

FTP Login Information

Paths & Folder Information

Design Considerations for Internet Surveys

Special Variables for Internet Surveys

Script to Create the HTML

Command Syntax & Help

Saving and Loading Styles

Survey Generation Procedure

Script Editor

Imbedded HTML Tags

Primary Settings

HTML Name (HTMLName=)

Banner Image(s)  (BannerImage=)

Heading  (Heading=)

Finish Text & Finish URL (FinishText= and FinishURL=)

Cookie (Cookie=)

IP Control (IPControl=)

Allow Cross Site (AllowCrossSite=)

URL to Survey Folder  (WebFolderURL=)

Advanced Settings - Header & Footer

RepeatBannerImage

RepeatHeading

PageNumbers

ContinueButtonText

SubmitButtonText

ProgressBar

FootnoteText & FootnoteURL

Advanced Settings - Finish & Popups

Thanks

Closed

HelpWindowWidth & HelpWindowHeight

HelpLinkText

LinkText

PopupBannerImage

PopupFullScreen

Advanced Settings - Control

Method

Email

RestartSeconds

MaximizeWindow

BreakFrame

AutoAdvance

BranchDelay

Cache

Index

ForceLoaderSubmit

ExtraTallBlankLine

RadioTextPosition

TextBoxTextPosition

LargeTextBoxPosition

LargeTextBoxProgressBar

Advanced Settings - Fonts & Colors

Global Attributes

Heading, Title, Text, & Footnote Attributes

Instructions, Question, and Response Attributes

Advanced Settings - Passwords - Color & Banner Image

LoginBannerImage

LoginBGColor

LoginWallpaper

LoginWindowColor

Advanced Settings - Passwords - Text & Control

PasswordType

LoginText

PasswordText

LoginButtonText

FailText

FailButtonText

ShowLink

EmailMe

KeepLog

Advanced Settings - Passwords - Single vs. Multiple

Password (single password method)

PasswordFile (multiple passwords method)

PasswordField & ID Field (multiple passwords method)

PasswordControl

Advanced Settings - Passwords - Technical Notes

Advanced Settings - Server Overrides

ActionTag

StorageFolder

ScriptFolder

Perl

MailProgram

Branching and Piping

Randomization (Rotations)

Survey Creation Script - Overview

Using Commands More than Once in a Script

Survey Creation - Specify Text

Heading

Title

Text

FootnoteText

Instructions

Question

Survey Creation - Spacing and pagination

BlankLine

NewPage

Survey Creation - Images and Links

Image

Link

Survey Creation - Help Windows

Survey Creation - Popup Windows

Survey Creation - Objects

Radio Buttons for a Single Variable

Radio Buttons for Grouped Variables (matrix style)

DropDown Menu

TextBox for a Single Variable

Adding a TextBox to a Radio Button,
    CheckBox, or Radio Button Matrix

TextBoxes for Grouped Variables

Sliders for Single or Grouped Variables

CheckBox for Multiple Response Variables

ListBox

Uploading and Downloading Files from the Server

Auto Transfer

FTP

Summary of the Most Common Script Commands

Overview

Format of an Email Address File

Extract Email Addresses

List Statistics

Join Two or More Lists

Split a List

Clean, Sort, and Eliminate Duplicates

Add ID Numbers to a List

Create a List of Nonresponders

Subtract One List From Another List

Merge an Email List into a StatPac Data File

Send Email Invitations

Using an ID Number to Track Responses

Email Address File

Body Text File

Sending Email

Overview

Mouse and Keyboard Functions

Designing Analyses

Continuation Lines

Comment Lines

V Numbers

Keywords

Analyses

Variable List

Variable Detail

Find Text

Replace Text

Options

Load, Save, and Merge Procedure Files

Print a Procedure File

Run a Procedure File

Results Editor

Graphics

Table of Contents

Automatically Generate Topline Procedures

Keyword Index

Keywords Overview

Categories of Keywords

Keyword Help

Ordering Keywords

Global and Temporary Keywords

Permanently Change a Codebook and Data File

Backup a Study

STUDY Command

DATA Command

SAVE Command

WRITE Command

MERGE Command

HEADING Command

TITLE Command

FOOTNOTE Command

LABELS Command

OPTIONS Command

SELECT and REJECT Commands

NEW Command

LET Command

STACK Command

RECODE Command

COMPUTE Command

AVERAGE, COUNT and SUM Commands

IF-THEN … ELSE Command

SORT Command

WEIGHT Command

NORMALIZE Command

LAG Command

DIFFERENCE Command

DUMMY Command

RUN Command

REM Command

Reserved Words

Reserved Word RECORD

Reserved Word TOTAL

Reserved Word MEAN

Reserved Word TIME

Analyses Index

Analyses Overview

LIST Command

FREQUENCIES Command

CROSSTABS Command

BANNERS Command

DESCRIPTIVE Command

BREAKDOWN Command

TTEST Command

CORRELATE Command

Advanced Analyses Index

REGRESS Command

STEPWISE Command

LOGIT and PROBIT Commands

PCA Command

FACTOR Command

CLUSTER Command

DISCRIMINANT Command

ANOVA Command

CANONICAL Command

MAP Command

Advanced Analyses Bibliography

Utility Programs

Import and Export

StatPac and Prior Versions of StatPac Gold

Access and Excel

Comma Delimited and Tab Delimited Files

Files Containing Multiple Data Records per Case

Internet Files

Email Surveys

Merging Data Files

Concatenate Data Files

Merge Variables and Data

Aggregate

Codebook

Quick Codebook Creation

Check Codebook and Data

Sampling

Random Number Table

Random Digit Dialing Table

Select Random Records from Data File

Compare Data Files

Conversions

Date Conversions

Currency Conversion

Dichotomous Multiple Response
   Conversion

Statistics Calculator Menu

Distributions Menu

Normal distribution

T distribution

F distribution

Chi-square distribution

Counts Menu

Chi-square test

Fisher's Exact Test

Binomial Test

Poisson Distribution Events Test

Percents Menu

Choosing the Proper Test

One Sample t-Test between Percents

Two Sample t-Test between Percents

Confidence Intervals around a Percent

Means Menu

Mean and Standard Deviation of a Sample

Matched Pairs t-Test between Means

Independent Groups t-Test between Means

Confidence Interval around a Mean

Compare a Sample Mean to a Population Mean

Compare Two Standard Deviations

Compare Three or more Means

Correlation Menu

Sampling Menu

Sample Size for Percents

Sample Size for Means

Keywords

Keyword Index

These keywords may be used in a procedure to control labeling and perform transformations.

 

Average

Compute

Count

Data

Difference

Dummy

Footnote

Heading

If...Then...Else

Labels

Lag

Let

Merge

New

Normalize

Options

Recode

Rem

Run

Save

Select/Reject

Sort

Stack

Study

Sum

Title

Weight

Write

 

Keywords Overview

Keywords are used in a procedure for everything except selecting the analysis type. These words (commands) are recognized by StatPac when used at the beginning of a line. They are used for study and data specifications, labeling output, and data file transformations.

Some keywords will be used often (e.g., STUDY, HEADING, TITLE and OPTIONS). Other keywords will be used only rarely (e.g., LAG, DIFFERENCE, SUM). Most analyses can be designed using only a few keywords.

All procedure files must use the STUDY keyword once in the first procedure. The use of all other keywords is optional and depends upon the situation.

The following is the list of keywords supported by StatPac

STUDY, DATA, SAVE, WRITE, MERGE, HEADING, TITLE, FOOTNOTE, LABELS, OPTIONS, SELECT, REJECT, NEW, LET, STACK, RECODE, COMPUTE, COUNT, SUM, AVERAGE, IF-THEN-ELSE, SORT, WEIGHT, NORMALIZE, LAG, DIFFERENCE, DUMMY, and RUN.

 

Categories of Keywords

 

Keywords can be logically divided into four categories. The categories are:

 

1. Commands for selecting, loading and saving files.

STUDY

DATA

SAVE

WRITE

MERGE

2. Commands for labeling.

HEADING

TITLE

FOOTNOTE

LABELS

3. Commands for setting analysis options.

OPTIONS

4. Commands for creating new variables and transforming data.

SELECT

REJECT

NEW

LET

STACK

RECODE

COMPUTE

COUNT

SUM

AVERAGE

IF..THEN..ELSE

SORT

WEIGHT

NORMALIZE

LAG

DIFFERENCE

DUMMY

RUN

Keyword Help

Most keywords require one or more parameters. These parameters are governed by the syntax requirements of the keyword. Each keyword has its own syntax. To quickly display help for a keyword, select Help, Keywords, and click on the desired keyword to display the help information.

While a procedure can contain only one analysis specification command, it can have many keyword commands. Keywords are often used in combination with each other to form more complex procedures. In the following procedure, the first four lines are keyword commands and the fifth line is an analysis specification command.

 

STUDY ATTITUDES

HEADING StatPac Analysis of the Attitude Survey

TITLE Overall Attitude Broken Down By Sex

OPTIONS PF=NRCT GR=Y

CROSSTABS SUMMARY BY SEX

..

Ordering Keywords

Keywords can be used in combination with each other to perform virtually any selection or transformation. The order of the keywords in a procedure may or may not be important depending on the individual procedure.

Generally, it is important to consider the order of keywords whenever a procedure involves more than one transformation. If the results of one transformation are dependent upon another transformation, then proper order is imperative.

For example, in the following procedure three COMPUTE keywords are used to calculate values for three different variables. Since all of the computations are independent from each other (one doesn't depend on the results of another one), the order of the keyword commands doesn't make any difference. The following COMPUTE commands could be specified in any order.

 

STUDY SCIENCE

COMPUTE VAR_1 = 0

COMPUTE VAR_2 = 1

COMPUTE VAR_3 = 2

SAVE

..

When one keyword command depends upon the result of a different keyword command, the order of the commands is important. In the following example, each COMPUTE command uses the result of a previous command to perform its computation. Therefore, the order of the commands must be correctly specified.

 

STUDY SCIENCE

COMPUTE VAR_1 = 0

COMPUTE VAR_2 = VAR_1 + 1

COMPUTE VAR_3 = VAR_2 + 1

SAVE

..

 

Global and Temporary Keywords

Only three keywords are global (STUDY, DATA and HEADING). Once used, subsequent procedures will use the same study, data file and/or page heading. These parameters can be changed in any procedure and subsequent procedures will use the new parameters.

All other keywords are temporary and apply only to the procedure in which they appear. In the following example, the first procedure will use only the first 1000 records for the descriptive statistics analysis. The second procedure, however, will use all the records because the SELECT keyword only applies to the procedure in which it appears. The STUDY and HEADING (global keywords) will be the same for both procedures.

 

STUDY CROPS

HEADING Summary Statistics

SELECT 1-1000

DESCRIPTIVE YIELD

..

DESCRIPTIVE COST

..

 

Permanently Change a Codebook and Data File

Two keywords (SAVE and WRITE) allow you to make permanent changes to studies and data files. Any transformations in a procedure will become permanent if the SAVE command is included anywhere in that procedure. Permanent means that the study information and data file are irretrievable changed to reflect the transformations requested by the keywords. If you had a study of 10,000 records, the following procedure file might result in a huge loss of data and be devastating!

 

STUDY MARKET

SELECT 1001-2000

SAVE

..

Note that the inclusion of the SAVE keyword makes the selection permanent. That is, after executing this procedure, the data file will contain only 1000 records. All other records will have been eliminated by the SELECT command. The implications of this are obvious. Don't use the SAVE command unless you have a backup of the study information and data file.

We recommend that all procedure files begin by saving duplicates of the codebook and data, and subsequent procedures use the duplicates (leaving the original codebook and data intact). If the previous example were changed to the following, the MARKET codebook and data file would not be affected by the use of the SAVE command in the second procedure.

 

STUDY MARKET

WRITE MARKET1

..

STUDY MARKET1

SELECT 1001-2000

SAVE

..

 

Backup a Study

We strongly recommend making a backup of all study information and data files before beginning any analyses. If you have a backup, you will always be OK and you won't have to worry about saving erroneous transformations. If you don't have a backup, and you accidentally save an unwanted transformation, nothing can be done to recover the data.

Making a backup of a study is easy... you just have to remember to do it. Please, if you plan to use the SAVE command, make a backup first.

There are three basic ways to backup StatPac information:

1. Use Windows Explorer to copy the desired files to another folder or drive.

2. Select Data, Backup to create a backup folder and copy all files from the current data folder to the new folder

3. Run a procedure using the WRITE command to save a copy of the codebook and data file.

To make a backup of a codebook and data file, you need to run a two-line procedure. The first line specifies the codebook name (and its associated data file) and the second line uses the WRITE command to specify the study name for the backup files. For example, if we have a study called SURVEY, the following procedure will create a backup called B-SURVEY. The backup will contain both the study information and the data file.

 

STUDY SURVEY

WRITE B-SURVEY

..

The WRITE command is used to specify the name of the backup files. The name may include a drive or path. For example, if you wanted to store a backup copy on a diskette in the A drive, you would run the following procedure.

 

STUDY SURVEY

WRITE A:\B-SURVEY

..

The previous examples would not create a backup of the data entry form. The form itself is only used for data entry and editing, which is normally finished by the time you begin running analyses. Therefore, the form is not altered by any transformation or analysis. You can, however, manually make a backup of a form.

We suggest that you start all procedure files with a two-line procedure that creates a duplicate codebook and data file, and in the second procedure, begin using the new files, thus leaving your original file unchanged. In this example, the first procedure reads the SURVEY codebook and data file, and creates a new codebook and data file called SURVEY2. The second procedure says to use the SURVEY2 codebook and data for the second and all subsequent procedures. Thus, if any serious mistakes are made in transforming data, you could easily revert back to the original codebook and data by rerunning the first procedure.

 

STUDY SURVEY

WRITE SURVEY2

..

STUDY SURVEY2

(begin your procedures here)

..

 

As you become more experienced with StatPac, you may include other lines in the first procedure. You might want to create a few new variables that you intend to use later on in the procedure file. For example, you might want to always begin your procedure files by creating an absolute record number variable (useful for identifying bad data) and a net variable that can be used in banner tables. The following might be the beginning of the procedure file:

 

STUDY MYSTUDY

COMPUTE (N5) REC = RECORD

NEW (N1) “NET” Totals

LABELS NET (=)

WRITE TEMP-MYSTUDY

..

STUDY TEMP-MYSTUDY

(begin your procedures here)

 

STUDY Command

The STUDY command is used to specify the name of the codebook and data file being analyzed. It must be specified in the first procedure. Subsequent procedures will use the same study name unless another STUDY command is used to change it. The STUDY command may only be used once in any given procedure.

The syntax of the command is as follows:

 

STUDY <File name>

The file name (or study name) may include a path specification. If no path is specified, StatPac will assume that the study resides in default data subdirectory. No extension should be used when specifying a study name.

All the following are examples of the proper use of the STUDY command:

 

STUDY MARKET

STUDY C:\DATA\BOOK

STUDY A:SURVEY

 

DATA Command

The DATA command may be used to specify the name of the data file to be analyzed. It is only used when the data file name is not the same as the study name. The DATA command may be used only once in a given procedure.

The syntax of the command is as follows:

 

DATA <File name>

 

The file name may include a path specification. If no path is specified, StatPac will assume that the data file resides in the default data subdirectory. It is not necessary to use the extension .DAT when specifying the data file name; StatPac will automatically use this extension for all data files.

Using the DATA command changes the data file for the current and all subsequent procedures. The data file will continue to be analyzed until changed by another DATA or STUDY command.

All the following are examples of the proper use of the DATA command:

 

DATA RAWDATA

DATA A:RESULTS

DATA C:\STATPAC\HEALTH\SURVEY

 

SAVE Command

The SAVE command is used to make all transformations in a given procedure permanent.

There are two forms of syntax for the SAVE command. In the first form, the SAVE command is specified on a line by itself. When used this way, all transformations will be saved to the current codebook and data file. As a safety precaution, do not use the SAVE command in this way unless you have a back up of the study information and data file.

 

SAVE

The other form of the SAVE command lets you save the codebook and data to a different file. This form of the command is functionally identical to the WRITE command except that a variable list may not be specified. If a path is not specified as part of the output file name, the default data subdirectory will be assumed. If the file name contains spaces, you must enclose the file name in quotes or parenthesis.

 

SAVE <File name>

 

For example, the following procedure creates a new five-character variable called AVG (the average of eight test scores), and stores this as a permanent part of the codebook and data file.

 

STUDY SCORES

AVERAGE (N5.1) AVG = SCORE1 - SCORE8

SAVE

..

After running this procedure, the codebook SCORES.COD and data file SCORES.DAT will have one more variable than they had before the procedure was run. The SAVE command makes the new variable permanent.

As another example, consider the following two procedures. The first procedure uses the SORT command to sort the data file in ascending order by last name. Because the SAVE command is used, the data file will be saved in sorted order. Also notice that the save command specifies a file name, so the sorted data will be saved to the new file. The second procedure begins with the STUDY command to tell StatPac to use the new file and then prints a listing of three variables. The listing will be in sorted order because this is now the way the new data file is stored.

 

STUDY NAMES

SORT (A) LAST_NAME

SAVE "My New File Name"

..

STUDY My New File Name

LIST FIRST_NAME LAST_NAME PHONE

..

If you specify a file name with the SAVE command, your original codebook and data remains untouched, and you never have to worry about making a mistake. Even when you do not specify a file name, a mistake  doesn't necessarily mean you should panic. The only transformations that are irreversible are those that change the original variables. As long as your raw data is intact, you should be able to recover from any mistake.

 

WRITE Command

The WRITE command is used to save transformed data in a file or write only selected variables to a file. Unlike the SAVE command (that saves the transformations in the original file), the WRITE command is usually used to save the transformed data in a different file.

The syntax for the WRITE command is:

 

WRITE <File name> <Variable list>

 

The WRITE command actually creates a new codebook and data file called a subfile. The subfile is just like any other study. The difference between the original study and the subfile is that the subfile reflects all transformations performed in the procedure.

While the SAVE command saves all transformations in the original file (unless a filename is specified), the WRITE command always creates a new codebook and data file and saves all transformations in the new files. The <Variable list> parameter is used to control what variables should be written to the subfile. When the <Variable list> is not specified, all variables will be included in the subfile.

The following procedure would use a codebook called NAMES (and an implied data file called NAMES). The procedure will sort the file by last name and save the sorted study as a subfile using a new study name SORTNAME. All variables from the original file will be included in the subfile.

 

STUDY NAMES

SORT (A) LAST_NAME

WRITE SORTNAME

..

In a similar example, this procedure file selects every tenth name from the original file (NAMES) and creates a new study (subfile) called MINIFILE. The subfile will contain the same variables as the original study, however, there will be only one-tenth as many records. The second procedure uses the STUDY command to access the subfile and list three selected variables.

 

STUDY NAMES

COMPUTE  (N7.1)  REC = RECORD

COMPUTE  (N7.1)  IREC = REC/10

COMPUTE IREC = INT(IREC)*10

IF REC = IREC  THEN SELECT

WRITE MINIFILE

..

STUDY MINIFILE

LIST FIRST_NAME LAST_NAME PHONE

..

The original data file (NAMES.DAT) is called the input file. It is the input data for the transformation procedure. The transformed data file (MINIFILE.DAT) is called the output file. It contains the output from the transformation procedure. The second procedure uses the STUDY command to access the output file from the first procedure. MINIFILE is now, in effect, its own study. The procedure created a MINIFILE.COD codebook  and MINIFILE.DAT data file.

When the output file name is different from the input file name, the input file will remain intact, and the new output file will contain the transformed data. When the input file name and the output file name are the same, the WRITE command functions identically to the SAVE command. In the following example, the output file will write over the input file and the original (pre-transformed) data will be lost. Unless you have an up-to-date backup, we recommend specifying a unique file name when using the WRITE command to eliminate the possibility of losing data.

 

STUDY FOOD

COMPUTE RISK = LOG( RISK )

WRITE FOOD     (causes a loss of the original RISK data)

..

The output file name parameter must be specified whenever the WRITE command is used. If a path is not specified as part of the output file name, the default data subdirectory will be assumed. If the output file name contains spaces, then you must enclose the name in quotes or parenthesis. For example:

 

WRITE ANOTHER FILE       Wrong

WRITE "ANOTHER FILE"    Correct

WRITE (ANOTHER FILE)     Correct

 

When a procedure does not create any new variables or change any labels, you can use the WRITE command to save a new data subfile without saving the codebook. It is not necessary to resave the information if none of the study information has been changed by transformations. To save a data file without saving the study information, add a .DAT extension to the file name parameter. Subsequent procedures could use the same (original) study information, but would require a DATA command to access the new data subfile. In the following example, the first procedure selects males from the file and writes a data subfile consisting of just males. The second and third procedures access this subfile with the DATA command and perform analyses on it. The analyses will only include those records selected in the first procedure. Note that the DATA command is required only in the second procedure because it remains in effect until changed by a STUDY command or another DATA command.

 

STUDY STATUS

IF SEX = "M" THEN SELECT

WRITE MALES.DAT

..

TITLE Income Statistics for Male Respondents

DATA MALES

DESCRIPTIVE INCOME

..

TITLE Housing Information for Male Respondents

FREQUENCIES HOUSING

..

The WRITE command may be used to create a subfile of selected variables simply by specifying which variables are to be contained in the subfile. The <Variable list> parameter allows you to control which variables (and in which order) will be written to the subfile. All the previous examples did not specify a variable list so StatPac would write all the existing and newly created variables to the subfile. In the following example, two variables (OVERALL and LASTYEAR) will be written to the new subfile SUMMARY.

 

STUDY ATTITUDES

WRITE SUMMARY OVERALL LASTYEAR

..

The variable list may be specified as individual variables, a range of variables, or a combination of the two. It may include both variable names and V numbers. Either commas and/or spaces may be used to separate variables. Continuation lines may be used for long variable lists. All of the following WRITE commands contain valid variable lists:

 

WRITE SUMMARY V1-V3 V10 V11 V15

WRITE SUMMARY V1, V2 ,V3, V10, V11, V15

WRITE SUMMARY AGE INCOME V9-V14 V97

WRITE SUMMARY TEST1-TEST9 V4-V7 V1-V3

 

The WRITE command may also be used to change the order of the variables. When the new output file is created, variables will be written in the same order as specified in the variable list. Thus, it is easy to restructure the order of variables in a codebook and data file.

For example, suppose you have finished creating a codebook and data entry form. As you begin to enter data, you realize that you forgot to include a variable in the study design. If you have only entered a small amount of data, the easiest way to correct the problem is to:

1) delete the data file

2) add the new variable to the codebook and form using Study Design

3) begin entering data again.

 However, if you have already entered a large amount of data, you may not want to delete the data you have already entered. In this case, the WRITE command can be used to correct the problem.

In this example, you might want to add a new open-ended response variable immediately following variable 55 in a study called MYSTUDY. The codebook currently contains 88 variables. The following procedure creates a new variable called OTHER, and then resaves the codebook and data file inserting the new variable into the middle of the study. After running the procedure, the study and data file will contain 89 variables. The new variable (OTHER) will be blank for all existing records in the data file. The data entry form is not automatically updated, so you will also need to add the new variable to the form before using it to edit or enter data.

 

STUDY MYSTUDY

NEW (A50) "OTHER" Other Brand Specified

WRITE MYSTUDY V1-V55 OTHER V56-V88

..

The WRITE command can also be used to increase the length of an open-ended response variable. Suppose that variable 12 in the study is called CITY, and it is 15 characters in length (i.e., the format is A15). After entering a large number of records into the data file, you come across a city name that is 18 characters in length, and you want to increase the length of the CITY field from 15 to 20 characters. First, run the following procedure to create a new 5 character dummy variable and write it as the 13th variable in the study (immediately following the CITY variable). This will insert 5 blank spaces after the current 15 character CITY field. Then, go to the study design, delete the dummy variable and change the format for the CITY field to A20. Again, the data entry form is not automatically modified by this procedure, and therefore, it needs to be changed to reflect the new field length for the CITY variable.

 

STUDY MYSTUDY

NEW (A5) "DUMMY"

WRITE MYSTUDY V1-V12 DUMMY V13-V88

..

 

MERGE Command

The MERGE command is used to merge a study and data file into the current study and data file. It is usually appears as the only line in a procedure. The syntax of the MERGE command is:

 

MERGE <filename>

 

The <filename> is the name of the study and data file that you want to merge into the current study. After running the procedure, the variables (and data) from the merge file will be added to the end of your original variables. That is, after running the procedure, the number of variables in your original codebook will increase. The new variables will appear in the Variable List window.

Both the current study and the file to be merged must have the same number of records and they must be in the same order. In other words, record one in the current data file is the same respondent as record one in the data to be merged.

The MERGE command does not automatically delete the files after the merge is completed. The study and data will be merged into your current study, but they will also remain on disk in their original form. If you want to delete the study and data after the merge is completed, add a /K to the end of the <filename>. For example, the following MERGE command would add the variables and data from a file called DEMOGRAPHICS to the current study (RESEARCH), and then delete the DEMOGRAPHICS codebook and data after the merge has been successfully completed.

 

STUDY RESEARCH

MERGE DEMOGRAPHICS /K

..

 

HEADING Command

 

The HEADING command is the method used to place a page heading on the printouts. It will appear in the top left corner of all printouts. The syntax for the HEADING command is:

 

HEADING <Page heading>

 

The use of the HEADING command is optional. If it is not included in a procedure file, the default page heading will be used.

Like the STUDY command, the HEADING command is usually specified in the first procedure. Subsequent procedures will use the same heading as the previous procedures unless the HEADING command is used to change the page heading.

Examples of the HEADING command might be:

 

HEADING Family Planning Attitudes Study - 1999

HEADING StatPac Inc. - Computer Software Division

HEADING CLASSIFIED INFORMATION - SECURITY CLEARANCE REQUIRED

 

There are no restrictions on the content of the heading. Both upper and lower case characters may be used. The heading may be printed as a blank line if the keyword HEADING is used by itself with no other characters on the line.

The HEADING command may be used only once in any given procedure. All tasks in that procedure will use the same heading. Subsequent procedures will, by default, use the last heading specified, or, the heading may be changed by using the HEADING command to reassign a new page heading to the output.

 

TITLE Command

The TITLE command is used to place a procedure or task title on the printouts. It will appear in the top left corner below the page heading.

The syntax for the TITLE command is:

 

TITLE <Procedure title>

 

The use of the TITLE command is optional. Unlike the HEADING command, it applies to only the procedure in which it appears. If a title is not specified for a procedure, no title will appear on the output.

There are no restrictions on the content of the title. Both upper and lower case characters may be used. The title may be printed as a blank line if the keyword TITLE is followed by two or more spaces, with no other characters on the line.

Examples of the TITLE command might be:

 

TITLE First Run of the Data - Descriptive Statistics

TITLE Frequency Analysis of Product Acceptance Questions

TITLE Comparisons of Males & Females

 

There is a special feature that may be used in the TITLE command. The three-characters (#) may be substituted in the title in place of the variable name. The title on the printout will contain the variable label instead of the (#) symbols. For example, the following procedure would produce three different titles. Each title will substitute the correct variable label in place of the (#) symbols.

 

STUDY SURVEY

TITLE Frequency Analysis of (#)

FREQUENCIES AGE RACE SEX

..

This is especially useful when creating banner tables. When running the same banners on a series of stub variables, it is often desirable to place the variable label for the stub variable on the top of the page. The following commands would run 25 different banner tables (one per page) and the title for each table would be the variable label from on the stub. The  CO option turns off compression so that only one stub variable gets printed per page.

 

STUDY SURVEY

TITLE  (#)

BANNERS V1-V25 BY AGE RACE SEX

OPTIONS CO=N

..

 

FOOTNOTE Command

The FOOTNOTE command may be specified in any procedure to place a footnote at the bottom of each page of output. Only one FOOTNOTE command may appear in a procedure. The syntax for the FOOTNOTE command is:

 

FOOTNOTE <Page footnote>

 

The use of the FOOTNOTE command is optional. If it is not included in a procedure file, no footnote will be printed. There are no restrictions on the content of the footnote. Both upper and lower case characters may be used.

Examples of the FOOTNOTE command might be:

 

FOOTNOTE This Analysis Was Produced By StatPac Inc.

FOOTNOTE Note: Includes all data collected through 1991.

 

The functionality of the FOOTNOTE command is like the HEADING command, where once specified, the footnote will apply to all subsequent procedures. To cancel a previously specified footnote, use the FOOTNOTE command without any footnote text (a blank footnote).

 

LABELS Command

The LABELS command may be used to assign value labels to a newly created variable or to change the value labels for an existing variable. It may also be used to change a variable label. The syntax of the command to change one or more value labels is:

 

LABELS <Variable list> (<Code>=<Label>)(<Code>=<Label>)...

 

The syntax of the command to change one or more variable labels is:

 

LABELS <Variable list> = <New Variable Label>

 

Generally, when the LABELS command is used to change a variable label, only a single variable is specified (although a variable list could be used to change the variable labels for a series of multiple response variables). This example changes the variable label for variable five to "How do you feel about the program?". Note that the variable label text is not enclosed in parentheses or quotes.

 

LABELS V5 = How do you feel about the program?

 

When used to change value labels, the LABELS command is often used in conjunction with the RECODE command. For example, let's say a survey asked the respondent's age (AGE). For our purposes, it is sufficient to report the age as either "under 21" or "21 and over". The RECODE command would be used to recode the data, and the LABELS command would be used to assign value labels to the new categories:

 

STUDY VOTING

RECODE AGE (LO-20=1)(21-HI=2)

LABELS AGE (1=Under 21)(2=21 and Over)

FREQUENCIES AGE

..

Continuation lines are allowed in the LABELS command. For example, if AGE were to be divided into five groups, the LABELS command could be entered as:

 

RECODE AGE (LO-20=1) (21-30=2) (31-40=3) (41-50=4) (51-HI=5)

LABELS AGE  (1=Under 21) (2=21-30 Years) (3=31-40 Years) (4=41-50 Years) (5=Over 50 Years)

..

When used to change value labels, two restrictions apply to the use of the LABELS command. The first is that some discretion should be used in the length of the value labels. Excessive value labels will make printouts difficult to read. A good guideline is to limit value labels to about 30 characters. The second restriction is that the code on the left of the equals sign must not contain more characters than the field width.

Value labels being assigned to a new or existing variable will be temporary and apply only to the current procedure. The value labels can be made permanent by using the SAVE or WRITE commands.

The LABELS command may be used more than once in a procedure and it may be used to assign new labels to more than one variable. For example, the following procedure file uses the LABELS command to assign new value labels to ten consecutive items on a questionnaire, and to another variable called OPINION. Because the SAVE command is also specified in the procedure, the new labels will become a permanent part of the study information, replacing any previous labels.

 

STUDY SURVEY

LABELS ITEM1-ITEM10 (A=Very much) (B=Somewhat) (C=Not at all)

LABELS OPINION (1=Positive) (2=Undecided) (3=Negative)

SAVE

..

When using the LABELS command to change a single value label or add new value labels, it is not necessary to retype all the value labels. Adding an exclamation mark to the end of the LABELS command instructs StatPac to update the current labels rather than completely replace them.

For example, if you want to show the no-responses on a printout, even though there is not a value label for the no-responses, just add an exclamation point to the end of the line and StatPac will update the existing value labels. The following command would add a "No response" value label to variable one. The value labels that already exist for variable one would remain intact.

 

LABELS V1 ( =No response)!

 

In another example, suppose two of the value labels for an income variable are: "1=Less than $10,000 per year" and "5=More than $35,000 per year". After reviewing a banners table, you decide that it would look better if the value labels were abbreviated. The following line could be used to change the value labels for codes 1 and 5 without affecting the value labels for codes 2, 3, and 4.

 

LABELS V1 (1=<10,000) (5=>35,000)!

 

Any value label in the LABELS command may include a vertical bar to force a new line in the printout. (The vertical bar appears as two stacked vertical bars on the keyboard).  This is especially useful in banner tables when you want to force the location of a break in the banner heading. For example, the following value label will cause the word "Responsibility" to print on two lines with a hyphen after the letter "n":

 

LABELS V1 (1=Respon-|sibility)

 

OPTIONS Command

The OPTIONS command is used in conjunction with any of the analysis commands. Its purpose is to specify the computational parameters used to control the analysis, and to select the kinds of printouts desired.

The default parameters for each analysis are stored in the StatPac.Ini file. When you run an analysis without specifying any options, the analysis will be run using the default parameters. The OPTIONS command is simply a way to override or change these settings. If the OPTIONS command is excluded, the values in the table will be used for the analysis.

The analysis editor allows you to display and modify the options for the current procedure by selecting Options. The options will appear, and you will be able to modify them. You can also type options directly on the OPTIONS line in a procedure.

Each analysis has its own options. After selecting Options, the options and their default values will be displayed. You can temporarily change any options by simply entering the desired value. The change will be temporary in that it will only apply to the current procedure. To permanently change an option, add an exclamation point suffix (!) to the desired value. The default value for the option will be permanently changed so that the new value becomes the default.

The following information is only necessary if you choose to manually enter the options.

All of the analysis options are designated by two-letter codes. Use one or more spaces to separate the options. The format for the OPTIONS command is:

 

OPTIONS  <Code>=<Value> <Code>=<Value> <Code>=<Value>...

 

To set an option, type the two-letter code followed by an equals symbol and the value you want to give the option. For example, to set one decimal place on the report (option DP), you could type:

 

OP DP=1    (Note: OPTIONS can be abbreviated as OP)

 

If there are too many option codes to fit on a single line, continue typing and let the automatic word-wrap take care of indenting the continuation line If you use a hard return at the end of a line, make sure that the break between lines occurs between two options so that no option specification is divided by the break. For example:

 

OPTIONS DS=N RS=Y RC=N MS=Y ST=Y FO=2 PS=Y

     DP=1 AC=Y PR=Y

 

Alternately, multiple options statements may be specified in the same procedure. They will be interpreted as if they were one option line. The previous options line could have been specified as:

 

OPTIONS PI=12 DS=N RS=Y RC=N MS=Y ST=Y FO=2 PS=Y

OPTIONS DP=1 AC=Y PR=Y

 

Generally, options only apply to the procedure in which they appear. If an exclamation point is added as a suffix to the option, it will become the default for all future analyses unless changed by another option. In the following example, the OPTIONS line in the first procedure sets the default for the percentage base to the number of cases. Because, the PB option ends with exclamation point, the second procedure (and all subsequent procedures) will continue to use the same percentage base (i.e., PB=N).

 

STUDY MYSTUDY

FREQ V1-V50

OPTIONS PB=N!

..

FREQ V60-V70

..

 

SELECT and REJECT Commands

The SELECT and REJECT commands are used to create a subset of the data that consists of just some of the records from the original file. The commands are temporary and apply only to the procedure in which they are specified.

You can use the SELECT or REJECT commands to select or reject by record number range or by other criteria. The syntax for the SELECT and REJECT commands are identical. Records will be selected or excluded from the procedure based upon the selection or rejection criteria.

The format for the SELECT and REJECT commands to select by record number range is:

 

SELECT <Low record # - High record #>

 

REJECT <Low record # - High record #>

 

Type the keyword SELECT followed by the record number range. For example, to select just the first 50 records from your data file, you would type:

 

SELECT 1-50

 

This would cause the first 50 records to be selected for further processing.

 

You could exclude a single record from an analysis (say record 25) with the following command:

 

REJECT 25

 

You may exclude the high record number if you want the entire last part of the file to be used in the procedure. For example, the following SELECT command would skip the first 74 records from the data file. The procedure will use all the records from 75 on.

 

SELECT 75-

 

Similarly, the following command would select only the first 50 records for an analysis (records 51 to the last record would be rejected):

 

REJECT 51-

 

When the SELECT or REJECT command is specified, subsequent procedures will use the full data set (not just the selected records). In the following example, the first procedure lists 50 records, while the second procedure lists the entire data file.

 

STUDY QUESTION

SELECT 1-50

LIST ID OPINION GROUP

..

LIST ID OPINION GROUP

..

The WRITE command is used to create a subfile of the selected data so that subsequent procedures could access the subfile. When the WRITE command is not specified, the selected records will be used for all the tasks in the procedure, but will not be written to a permanent subfile.

Never use the SAVE command in the same procedure as the SELECT command unless you have a backup of the study information and data. Doing so will eliminate records from your data file, and they will not be recoverable without a backup.

The other form of the SELECT and REJECT commands is used in conjunction with the IF-THEN command. For example, to select just males for an analysis, you would enter the command:

 

IF SEX="M" THEN SELECT

 

A record will be selected if the criteria is met (i.e., if SEX is equal to M). The following would also select just the males by eliminating the females:

 

IF SEX="F" THEN REJECT

 

The general form of this command is:

 

IF <Statement> THEN SELECT

 

The spaces in the syntax are mandatory. There must be at least one space after the IF and at least one space on each side of the THEN. Spacing within the <Statement> portion of the command doesn't matter.

If the <Statement> portion of the command is true for a given record, that record will be selected and written to the output file and/or included in the analysis. If the <Statement> portion is false for a record, it will be skipped (omitted from the procedure).

The quotation marks around the code to be selected are mandatory for alpha-type data only. For numeric-type variables, the quotation marks are unnecessary. The following procedure would first select students that had a grade point average of 3.5 or higher, and then perform two frequency analyses using the selected records.

 

STUDY STUDENTS

IF GPA > = 3.5 THEN SELECT

FREQUENCIES SEX RACE

..

 

Similarly, the following analyses would only be performed on students who had at least a 2.0 grade point average:

 

STUDY STUDENTS

IF GPA < 2.0 THEN REJECT

FREQUENCIES SEX RACE

..

 

The SELECT Command can perform an Nth record selection when used in combination with compute commands. For example, the following procedure would list every tenth record. Note that the integer function is used to check for the record numbers that are evenly divisible by ten. Also note that the REC variable was computed so that the listing would show the record number from the original data file instead of its sequence number in the selected subset.

 

STUDY MARKET

COMPUTE (N5) REC = RECORD

COMPUTE (N5) INTEGER = INT(REC/10)

COMPUTE (N5) REAL = REC/10

IF INTEGER = REAL THEN SELECT

LIST REC NAME PHONE

..

The SELECT or REJECT command may be used to exclude missing data from an analysis by selecting only non-blank records. For example, either of these two commands could be used to select non-blank records for an analysis of an attitude question, you would enter the command:

 

IF ATTITUDE <> " " THEN SELECT

IF ATTITUDE = " " THEN REJECT

 

Notice that a space is used to indicate missing data. The SELECT command says to select all records where ATTITUDE is unequal to a blank. The REJECT command says to exclude all records where ATTITUDE is equal to a blank. The result of either command will be the same. Records with non-missing data will be selected. When selecting missing data, quotation marks are required regardless of whether the variable is alpha or numeric. There does not have to actually be a space between the quotation marks (i.e., two quotation marks together would accomplish the same thing).

The SELECT and reject commands are often used in conjunction with the LIST command to list open-ended comments. The purpose is to eliminate records where the respondent made no comment. Both of these procedures would produce identical printouts.

 

IF COMMENT <> " " THEN SELECT

LIST COMMENT

..

IF COMMENT = " " THEN REJECT

LIST COMMENT

..

 

The SELECT and REJECT commands may also be used in combination with AND and OR relational operators to select or reject records based on multiple criteria. Using AND and OR relational operators can sometimes be confusing. Refer to the IF..THEN statement for a full discussion of relational operators.

As a simple example, suppose we wanted to perform an analysis of people over 60 who rate their health as good (1=Good 2=Fair 3=Poor). Both criteria must be true before we'll select the record (a person must be over 60 AND they must be in good health). The command would be:

 

IF AGE > 60 AND HEALTH = 1 THEN SELECT

 

When the AND operator is used, both statements must be true before the record will be selected. The OR operator works differently. When OR is used, the record will be selected if either statement is true. For example, the following command will select people that are Democrats OR people that have no political affiliation (D=Democrat R=Republican N=None):

 

IF AFFILIATION = "D" OR AFFILIATION = "N" THEN SELECT

 

A statement can contain as many AND and OR operators as needed. When there are no parenthesis, AND and OR statements will be evaluated from left to right. Parentheses may be used to control the order that the statement will be evaluated. For example, the following command would select all males over 18 and all females over 21.

 

IF (SEX="M" AND AGE>18) OR (SEX="F" AND AGE>21) THEN SELECT

 

StatPac contains a special feature that simplifies the syntax of complex OR statements. For example, suppose you want to select respondents from groups 1, 3, 7 and 9. Using OR operators, you would type:

 

IF GROUP=1 OR GROUP=3 OR GROUP=7 OR GROUP=9 THEN SELECT

 

The same command in using the simplified OR syntax would be:

 

IF GROUP = "1/3/7/9" THEN SELECT

 

In the first statement, the record will be selected if any part of the statement is true. In the second statement, the OR relational operator is replaced by the slash. When using this method to perform a multiple selection, the codes ("1/3/7/9") must be enclosed in quotes, regardless of the variable type (alpha or numeric). There is no limit on the number of slashes that may be used in the statement to replace OR relational operators. Identical syntax may be used with the REJECT command.

As another example, if some data entry operators entered upper case codes, while other data entry operators entered lower case codes, we would want to select if the code was either upper OR lower case. The following two statements would produce identical results:

 

IF SEX = "M" OR SEX = "m" THEN SELECT

 

IF SEX = "M/m" THEN SELECT

 

NEW Command

The NEW command is one of several keywords that allows you to create new variables. It's primary application is for creating new alpha-type variables. The advantage of the NEW command over other ways of creating new variables is that it allows you to specify a variable label.

The syntax for the command is as follows:

 

NEW <(Format)> <"Variable name"> <Variable label>

 

The format defines the type, field width and decimal formatting (if numeric) for the new variable. It is specified using the same conventions as in the study design except that it is enclosed in parentheses.

The <Format> for creating new alpha variables is:

 

(Ax)     where x is the field width

 

The following command would create a new one-column alpha variable named GROUP. The variable label for GROUP would be Group Identification Code.

 

NEW (A1) "GROUP" Group Identification Code

 

Quotation marks must be included around the variable name. The variable name itself should be brief (e.g., less than 20 characters in length). The variable label may use a continuation line if the entire label will not fit on the first line. The new variable will be initialized to blanks (i.e., the variable exists, but its value is missing).

Numeric variables may also be created with the NEW command.

The <Format> portion of the syntax for creating new numeric variables is:

 

(Nx)       where x is the total field width for the variable

or

(Nx.y)    where x is the total field width and y is the

                number of decimal characters

 

When working with integer data, the first format is preferable. For example, the following command would create a two-column numeric variable. The variable name is GRAND-TOTAL and it's variable label is "The sum of the individual scores". The variable does not have any specified formatting.

 

NEW (N2) "GRAND-TOTAL" The sum of the individual scores

 

When creating variables that will contain decimal formatting, the precision for the formatting should be specified. While this is not mandatory, it is highly recommended. For example, let's create a new numeric variable called PROFIT-LOSS. Your study now contains two variables, EXPENDITURES (V1) and INCOME (V2). The first step is to give the new variable a name and/or label. Type:

 

NEW (N10.2) "PROFIT-LOSS" The bottom line

 

This would create a new numeric variable called PROFIT-LOSS.  It will become the third variable in the study. The variable will have a total field width of ten characters (seven to the left of the decimal point, the decimal point itself, and two to the right of the decimal point). The field width refers to the total number of characters reserved for the variable. This includes the space necessary for a decimal point and/or minus sign. In other words, you must have an idea of the magnitude of the new variable before creating it.

After entering the NEW command, the new variable can be referenced by the new variable name (PROFIT-LOSS) or by the new V number (V3). The COMPUTE statement could then be used to calculate a value for the new variable PROFIT-LOSS.

To create several new variables in the same run, simply type several NEW statements on successive lines. Each NEW command will create one new variable.

When running a procedure with the NEW and SAVE commands, the new variable will be created and saved at the end of the codebook and data files. If you attempt to run the same procedure again, StatPac will tell you that you are attempting to create a new variable with the same name as an existing variable (i.e., it was created and saved the first time you ran the procedure.  The StatPac.ini file can be edited so StatPac ignores this situation. Set IgnoreDuplicateNewCommand = 1 in the StatPac.ini file to tell StatPac to ignore the NEW command if a variable already exists with that variable name.

When using the NEW command, the variable name is subject to the same restrictions as during the study design.

 

1. Shorter variable names are preferable.

2. A variable name must be unique from all other variable names and may not be the same as any keyword.

3. The first character of a variable name may not be a number or a space.

4. A variable name may not be the same as a V number. For example, you cannot name a variable "V12".

5. A variable name may not contain a comma or period. The variable name may include a space; however, for the purpose of clarity, we recommend using a dash or underline character instead of a space.

6. A variable name may not be D, E, RECORD, TIME, LO, HI, WITH, BY, THEN, TOTAL or MEAN. These words have special meaning to StatPac.

 

LET Command

The LET command is used to create a new variable from an existing variable or to assign a new value to an existing variable. The syntax of the command is:

 

LET <New or existing variable name> = <Existing variable name>

 

The LET command is often used as a way of making transformations to a data file without destroying the raw data. For example, suppose AGE had been entered in the data file as the respondent's actual age. This would provide excellent descriptive statistics (mean, median, etc.), but it is not conducive to crosstab and banner tables. For these analyses, we want categorical data. If we recode the AGE variable into groups, the original data will be destroyed, since the AGE variable would then contain recoded data (not the raw data). The following procedure overcomes this problem by first using the LET command to create a duplicate copy of a variable, and then recoding the new variable rather than the original variable.

 

STUDY MARKET

LET AGE_GROUP = AGE

RECODE AGE_GROUP (LO-20=1) (21-30=2) (31-40=3) (41-HI=4)

LABELS AGE_GROUP (1=Under 21) (2=21-30) (3=31-40) (4=Over 40)

SAVE

..

The new variable AGE_GROUP is created with the LET command. Everything is duplicated except the variable name. AGE_GROUP will have the same format, variable label, value labels, and data as the original variable AGE. The only thing changed is the variable name. The RECODE command recodes AGE_GROUP (leaving AGE intact), and the LABELS command assigns new value labels to the recoded data. The SAVE command makes the transformations permanent so subsequent procedures will have access to the new variable AGE_GROUP as well as the original raw data AGE.

When using the LET command to create a new variable, the new variable name is subject to the same restrictions as during the study design.

The LET command can also be used to increase the field width of a variable. Suppose you had created a 20 character alpha variable called CITY. After entering several records of data, you decide that you really need 30 characters for the CITY field. You cannot simply change the codebook because you have already entered data, and the existing data needs to be changed in addition to the codebook. The LET command provides an easy solution.

This procedure creates a new variable called NEWCITY and assigns the contents of the existing CITY variable to the new variable for all existing records. After running this procedure, you would add the NEWCITY variable to the data entry form.

 

STUDY CONTACTS

NEW (A30) "NEWCITY" City

LET NEWCITY=CITY

SAVE

..

When you create a new variable, it is added to the end of the codebook. In the previous example, the NEWCITY variable would become the last variable in the study.  The CITY variable would still be in the codebook and on the data entry form. A better solution is to use the WRITE command to eliminate the original CITY variable and replace it with the NEWCITY variable. In this example, CITY was originally variable 25 in the codebook. The WRITE command is used to set NEWCITY as variable 25 and the original CITY variable is omitted.

 

STUDY CONTACTS

NEW (A30) "NEWCITY" City

LET NEWCITY=CITY

WRITE CONTACTS V1-V24 NEWCITY V26-V100

..

 

The LET command can also be used to convert an alpha (A1) variable to a numeric (N2) variable. This is useful when you have data coded as A, B, C… and you would like to have it coded as 1, 2, 3.  You could recode the data (A=1)(B=2)(C=3), save the recoded data, and then manually change the codebook format from A1 to N1, and change the value labels. Alternatively, you can use the NEW and LET commands to do the recode. Suppose APPLE is an (A1) variable coded as A=1 apple, B=2 apples, and C=3 apples.  The following two commands would create a new variable (and data) called NUM_APPLE that was coded as 1=1 apple, 2=2 apples, and 3=3 apples. The new numeric variable must be specified with an N2 format.

 

NEW (N2) "NUM_APPLE" Number of Apples

LET NUM_APPLE=APPLE

 

 

STACK Command

A STACK command is used to create new variables that consist of all the possible combinations of categories from two to four other variables. It is especially useful for creating special variables in banner tables.

The syntax for the STACK command is:

 

STACK <New variable name> = <Variable list>

 

The STACK command may be used only to create a new variable (i.e., it cannot be used to calculate a new value for an existing variable). The format of the new variable will always be alpha, and the field width will be the sum of the individual variables in the variable list. The value labels for the new variable will be automatically created from the combinations of the value labels of the variables in the variable list. The STACK command is temporary and the new variable will only exist in the procedure where the command appears unless the SAVE or WRITE command is used.

As an example, if your study contains a variable called SEX and another variable called AGE, SEX is coded: M=Male and F=Female. AGE is coded: 1=Young, 2=Middle and 3=Old. The following command would create a new variable called SEXAGE that contained six value labels:

 

STACK  SEXAGE = SEX AGE

 

The new SEXAGE variable would be a two-column alpha variable, and its value labels would be:

 

M1 = Male Young

M2 = Male Middle

M3 = Male Old

F1 = Female Young

F2 = Female Middle

F3 = Female Old

The STACK command is a method of adding multiple dimensions to an analysis. If an analysis is performed using the new SEXAGE variable, the results would be based on both the SEX and AGE dimensions.

The STACK command <variable list> may stack up to four variables. For example, a third dimension based on a variable called RACE, could be added with the following command:

 

STACK SEXAGERACE = SEX AGE RACE

 

The RACE variable (coded: W=White, B=Black and C=Other) would add a third dimension to the DEMOGRAPHIC variable. The new value labels would be:

 

M1W = Male Young White

M1B = Male Young Black

M1O = Male Young Other

M2W = Male Middle White

M2B = Male Middle Black

M2O = Male Middle Other

M3W = Male Old White

M3B = Male Old Black

M3O = Male Old Other

F1W = Female Young White

F1B = Female Young Black

F1O = Female Young Other

F2W = Female Middle White

F2B = Female Middle Black

F2O = Female Middle Other

F3W = Female Old White

F3B = Female Old Black

F3O = Female Old Other

Note that the number of value labels in the new stacked variable is the product of the number of value labels in each of the individual variables in the variable list. Stacking more than two variables in a single command can potentially result in a huge number of value labels.

Also note that the new value labels are created by adding together the value labels that already exist in the study. The value label creation feature of the STACK command works best when each of the value labels for the stacked variables are short.

 

RECODE Command

The RECODE command is used to recode a variable into groups. It is a data reduction technique used for summarizing data. Both alpha and numeric data may be recoded. The simplest form of the command is:

 

RECODE <Variable or variable list> (<Old value> = <New value>)

 

For example, assume we have a variable called INDICATOR, and we want to change all values of 0 to a value of 5. We would type:

 

RECODE INDICATOR (0=5)

 

The space after RECODE and after the variable name is mandatory. Several variables can be recoded with the same command by specifying a variable list instead of a single variable. The following command would perform the same recode on ten consecutive INDICATOR variables.

 

RECODE INDICATOR1 - INDICATOR10 (0=5)

 

There are several other formats for the RECODE command. One of them allows you to string several recode statements together. For example, let's say you want to change all values of 1 and 2 to a value of 1, all values of 3 and 4 to a value of 2, and all values of 5 to a value of 3. The recode statement would be:

 

RECODE RATING (2=1)(3=2)(4=2)(5=3)

 

Note that it is not necessary to specify (1=1) as part of the recode command. As in the previous example, a variable list could be specified instead of a single variable.

If you prefer, you could reference the variable RATING by its variable number rather than its variable label. Just prefix the variable number with the letter V. For example, if RATING was the third variable in our data file, we could have typed the previous command as:

 

RECODE V3 (2=1)(3=2)(4=2)(5=3)

 

The final format for the RECODE command is used to specify a value range to be recoded instead of an absolute value. The syntax for this type of statement is:

 

RECODE <Var. or var. list> (<Low value> - <High value> = <New value>)

 

For example, let's say we want to recode all the values from 1 to 20 and give them a new value of 1, and we want to recode all values from 21 to 40 and give them a new value of 2, and finally, all values over 40 should be given a new value of 3. Our RECODE command would be:

 

RECODE AGE (1-20=1)(21-40=2)(41-99=3)

 

The keywords LO and HI may be included in a RECODE command. LO refers to the lowest value in a data file while HI refers to the highest value in the file. For example, let's say you have a variable called INVENTORY. To change all values from the lowest through 49 to a new value of 0 and also change all values from 50 through the highest to a new value of 1, you would use the following RECODE command:

 

RECODE INVENTORY (LO-49=0)(50-HI=1)

 

Missing information is stored in StatPac data files as spaces. Spaces may be used in the RECODE command to indicate missing data. For example, let's take the following survey question:

 

     How well do you like our spaghetti?

              1. A lot

              2. Somewhat

              3. Not at all

              4. No opinion

 

To recode all the "no opinion" responses to missing data, type:

 

RECODE OPINION (4= )

 

Notice that a blank is used as part of the RECODE statement to indicate missing data.

As a similar example, after downloading a twenty-variable data file (DATAFILE) from a mainframe, you discover that all missing data was coded as 99 instead of blanks. Since StatPac recognizes only blanks as missing data, you decide to recode all the variables and save the recoded data. The following commands are used to recode the data file and write a new data file:

 

STUDY DATAFILE

RECODE V1-V20 (99= )

SAVE

..

You may use any of the above formats or combination of formats to create RECODE commands. Many different RECODE commands can be specified in a single procedure. All recodes will be temporary in nature and will be applied to all the tasks in the procedure. A recode can be made permanent by using the SAVE or WRITE commands in the same procedure.

 

COMPUTE Command

 

The COMPUTE command is one of the most versatile keywords. It is used to perform algebraic and/or trigonometric functions on numeric variables. The COMPUTE command may be used whenever you want to use arithmetic to transform an existing variable or calculate a new variable. All operations are performed using double precision arithmetic.

The syntax for a compute statement is almost identical to the syntax that the BASIC interpreter uses to evaluate your programs. The format for the compute statement is:

 

COMPUTE <Variable> = <Equation>

 

For example, the following equation will add three variables together, calculate the mean average, and replace the contents of variable 9 with the result:

 

COMPUTE V9 = (V3 + V4 + V5) / 3

 

Notice that the letter V is used to distinguish a variable number from a constant. You may also use the actual variable names in the equation rather than a V number. For example, the same equation could be written:

 

COMPUTE DEPT_AVG = (DEPT_ONE + DEPT_TWO + DEPT_THREE) / 3

 

Compute statements may use five numeric operators and twelve intrinsic functions. They are:

 

+           addition

-            subtraction

*           multiplication

/            division

^           exponentiation

SQR     square root

LOG     natural log

SIN      sine

COS     cosine

TAN     tangent

ASN     arcsign

ATN     arctangent

ABS      absolute value

EXP      exponent

RND     random (random integer between 1 and the argument)

INT       integer (rounds argument up or down)

FIX       integer (drops decimal portion of the argument)

 

Equations may also use parentheses to specify the order of computations. If no parentheses are included, computations will be performed in the standard hierarchical order (intrinsic functions / exponentiation / multiplication & division / addition & subtraction). If no hierarchy exists, the equation will be evaluated from left to right. Spaces in an equation will be ignored.

 

The following are examples of valid equations:

 

COMPUTE V11 = (V22 * 1.3) / (V21 + V16)

COMPUTE V9 = ((V6-V7)*1.42)/9.31

COMPUTE V5 = 0

COMPUTE CIRCUMFERENCE = 3.14159 * DIAMETER

COMPUTE V3 = 3.14159 * V2 ^ 2

COMPUTE V12 = LOG(DOLLARS)

COMPUTE TOTAL-SALES = SQR(V12 - 16.2)

COMPUTE MODIFIED = SIN((ORIGINAL-4.12)+ORIGINAL)

COMPUTE ROUNDED-NUMBER = INT(NUMBER)

COMPUTE TRUNCATED-NUMBER = FIX(NUMBER)

 

The following are invalid equations:

 

COMPUTE V17 = ((V4/V5)     (Mismatched parentheses)

COMPUTE V9 = V6/0              (Division by zero is illegal)

COMPUTE V9 + V3 = V2        (Only one variable allowed to the left

                                                     of the equals sign)

 

All computations (including the intrinsic functions) are performed using double precision. The result will be rounded to the precision specified by the variable being computed (i.e., the decimal formatting of the variable).

For example, let's say we want to compute an average of three variables. The variable being computed has a format of N5.2. We would write a compute statement to add the three variables and divide by three. If the sum of the three variables is 100, the mean average will be calculated as 33.33333333333333 and the result would be rounded to 33.33.

When no decimal formatting exists for the variable being computed, the result will be expressed with the maximum decimal precision possible. For example, if the format of the computed variable were N7, the previous example would be rounded to 33.3333 (a total field width of seven characters including the decimal point). If the format were N2, the result would be rounded to 33.

If a computed variable becomes too large for the field width of the variable, the precision of the result may be diminished. In the extreme case, where the integer portion of the result would be changed, the result will be set to blanks (missing data). In the previous example, if the format of the computed variable were N1, the result would be stored as missing data because the result (33.333...) could not be expressed using an N1 format. It is, therefore, very important that you have an idea of the magnitude of the number you will be computing.

As another example, suppose we want to add three variables (SCORE1, SCORE2 and SCORE3). The scores are between 0.0 and 99.9 so the variables were originally defined using N4.1 formats. The sum of the variables could exceed 99.9 (four characters), so the new variable that holds the sum (TOTAL_SCORE) should be created with an N5.1 format.

 

NEW (N5.1) "TOTAL_SCORE" Sum of the three test scores

COMPUTE TOTAL_SCORE = SCORE1 + SCORE2 + SCORE3

 

The COMPUTE statement can also be used to create a new variable directly without first using the NEW command. The syntax to create a new variable is:

 

COMPUTE (<Format>) <New variable name> = <Equation>

 

The only disadvantage of using the COMPUTE command to create a new variable is that a variable label cannot be specified.

The command for the above example is:

 

COMPUTE (N5.1) TOTAL_SCORE = SCORE1 + SCORE2 + SCORE3

 

A new five-character numeric variable would be created and called TOTAL_SCORE. It will have three digits to the left of the decimal point and one digit to the right. Notice that the format for the new variable is enclosed in parentheses.

Often, equations will become very complex and require many levels of parentheses. While StatPac can handle virtually any level of complexity, it is sometimes easier to break an equation into several smaller equations and store the intermediate results in the variable being computed.

For example, the following complex equation could be broken down into smaller equations:

 

COMPUTE (N6.2) NEWVAR = (V7-V6) + 14.82

 

Could be expanded to:

 

NEW (N6.2) "NEWVAR"

COMPUTE NEWVAR = V7 - V6

COMPUTE NEWVAR = NEWVAR + 14.82

 

Notice that a new variable (NEWVAR) was first created with the NEW command, and then computed as the difference between variables 7 and 6. Finally, it was recomputed to its current value, + 14.82. Because NEWVAR is not already a variable label in the file, it will become the next available variable. In this example, if there were 18 variables already in the file, NEWVAR would become the variable name for variable 19.

Sometimes you will need several COMPUTE statements to accomplish a task. Surprisingly, one of the most difficult formulas you might use is to find the number of days between two dates. Several COMPUTE statements are required. Each date requires three variables in your study (month, day and year). Month and day each require two columns, and year requires four columns. In the following example, the variable names for each date are: MONTH1 DAY1 YEAR1 and MONTH2 DAY2 YEAR2. If the names in your study are different, you must modify this procedure. The DIFF variable contains the number of days between the two dates.

This subroutine can be merged into your procedure file. Note that this example assumes that four digits were used in your study to store the years.

 

COMPUTE (N4) YR1 = YEAR1

COMPUTE (N7) TIME1.0 = 365 * YEAR1 + DAY1

IF MONTH1 > 2 THEN COMPUTE TIME1 = TIME1 - INT(MONTH1*.4+2.3)

IF MONTH1 > 2 THEN COMPUTE YR1 = YR1 + 1

COMPUTE TIME1 = INT(TIME1 + (YR1-1) / 4 + MONTH1 * 31)

COMPUTE (N4) YR2 = YEAR2

COMPUTE (N7) TIME2 = 365 * YR2 + DAY2

IF MONTH2 > 2 THEN COMPUTE TIME2 = TIME2 - INT(MONTH2*.4+2.3)

IF MONTH2 > 2 THEN COMPUTE YR2 = YR2 + 1

COMPUTE TIME2 = INT(TIME2 + (YR2-1) / 4 + MONTH2 * 31)

COMPUTE (N5) DIFF = ABS(TIME2 - TIME1)

 

The year 2000 is not a problem with the previous procedure because the year was stored using four digits. If you have data that uses only two digits to store the year, then the procedure can be modified to correct the millennium change.

First, determine the oldest year in the data. Suppose in the following example, the oldest value for YEAR1 or YEAR2 is 1921 (stored in the data as 21). By checking each date and comparing it to 21, you can determine if it is a 19xx or 20xx year. Again, the DIFF variable contains the number of days between the two dates.

 

COMPUTE (N2) OFFSET=21

NEW (N4) "YR1"

IF YEAR1 < OFFSET THEN COMPUTE YR1 = YEAR1 + 2000

    ELSE YR1 = YEAR1 + 1900

COMPUTE (N7) TIME1.0 = 365 * YR1 + DAY1

IF MONTH1 > 2 THEN COMPUTE TIME1 = TIME1 - INT(MONTH1*.4+2.3)

IF MONTH1 > 2 THEN COMPUTE YR1 = YR1 + 1

COMPUTE TIME1 = INT(TIME1 + (YR1-1) / 4 + MONTH1 * 31)

NEW (N4) "YR2"

IF YEAR2 < OFFSET THEN COMPUTE YR1 = YEAR2 + 2000

     ELSE YR1 = YEAR2 + 1900

COMPUTE (N7) TIME2 = 365 * YR2 + DAY2

IF MONTH2 > 2 THEN COMPUTE TIME2 = TIME2 - INT(MONTH2*.4+2.3)

IF MONTH2 > 2 THEN COMPUTE YR2 = YR2 + 1

COMPUTE TIME2 = INT(TIME2 + (YR2-1) / 4 + MONTH2 * 31)

COMPUTE (N5) DIFF = ABS(TIME2 - TIME1)

 

There are several types of errors that may occur while using COMPUTE statements (e.g., division by zero or the square root of a negative number). The result of any invalid computation will be set to blanks (missing data).

 

AVERAGE, COUNT and SUM Commands

The AVERAGE, COUNT and SUM commands are provided to perform calculations in situations in which the COMPUTE command would fail because of missing data. The syntax for all three commands is identical.

The AVERAGE command calculates the mean average of all non-missing values in a list of variables. The syntax of the AVERAGE command is:

 

AVERAGE <Variable> = <Variable list>

 

The COUNT command counts the number of non-missing values in a list of variables. The syntax of the COUNT command is:

 

COUNT <Variable> = <Variable list>

 

The SUM command adds the non-missing values in a list of variables. The syntax of the SUM command is:

 

SUM <variable> = <variable list>

 

The reason for these commands is of the way the COMPUTE command handles missing data. If any of the variables in the COMPUTE statement are missing, the result will be missing. A value calculated by the AVERAGE command is missing only when all values in the list of variables are missing. For example, the following COMPUTE command would fail if either DEPT_ONE, DEPT_TWO, or DEPT_THREE contained a missing value.

 

COMPUTE DEPT_AVG = (DEPT_ONE + DEPT_TWO + DEPT_THREE) / 3

 

This COMPUTE command would store a blank result in DEPT_AVG for any record in which DEPT_ONE, DEPT_TWO, or DEPT_THREE is missing. Instead, the AVERAGE command could be used to calculate the mean of the non-missing values of the three variables in each record. The commands to accomplish this task would be:

 

STUDY INCOME

NEW (N9.2) "DEPT_AVG" Average Income of Departments 1-3

AVERAGE DEPT_AVG = DEPT_ONE DEPT_TWO DEPT_THREE

SAVE

..

In this example, the DEPT_AVG variable was created by a NEW command. The format of DEPT_AVG (N9.2) specifies two places to the right of the decimal point. Therefore, the result of the AVERAGE command will be expressed to two significant decimal places.

The AVERAGE command itself may also be used to create a new variable (making the use of the NEW command unnecessary). The syntax of the command is changed only by the inclusion of the new variable format.

 

AVERAGE (<Format>) <New variable name> = <Variable list>

 

The previous procedure could have been:

 

STUDY INCOME

AVERAGE (N9.2) DEPT_AVG = DEPT_ONE DEPT_TWO DEPT_THREE

SAVE

..

Note that the format must be enclosed in parentheses. The only disadvantage of using the AVERAGE command to create the new variable is that the new variable will not contain a variable label.

The COUNT command counts the number of non-missing values from a variable list. It' use is identical to the AVERAGE command, except that the result is the number of non-missing values instead of the average of the non-missing values. Like the AVERAGE command, it may also be used to create a new variable. A value calculated by the COUNT command is always an integer between zero and the number of variables specified in the list.

The following procedure creates a new variable DEPT_COUNT and counts the number of non-missing values in each record. Note that the new variable will always be an integer between 0 and 3, and therefore uses an (N1) format.

 

STUDY INCOME

COUNT (N1) DEPT_COUNT = DEPT_ONE DEPT_TWO DEPT_THREE

SAVE

..

The SUM command adds all the non-missing values from a variable list. The result is the sum of these values. Like the AVERAGE and COUNT commands, a new variable may be created. The following procedure would create a new variable called DEPT_TOTAL that contains the sum of the three departments:

 

STUDY INCOME

SUM (N10) DEPT_TOTAL = DEPT_ONE DEPT_TWO DEPT_THREE

SAVE

..

The final example shows all three commands in one procedure:

 

STUDY INCOME

AVERAGE (N9.2) DEPT_AVG = DEPT_ONE DEPT_TWO DEPT_THREE

COUNT (N1) DEPT_COUNT = DEPT_ONE DEPT_TWO DEPT_THREE

SUM (N10) DEPT_TOTAL = DEPT_ONE DEPT_TWO DEPT_THREE

SAVE

..

The result of saving these transformations in a data file could be:

 

DEPT_1 | DEPT_2 | DEPT_3 | DEPT_AVG | DEPT_COUNT | DEPT_TOTAL

   36        |     43       |     42       |       40.33       |             3            |        121

   49        |     54       |                |       51.50       |             2            |        103

   27        |     60       |     48        |      45.00        |             3            |        135

   31        |                |                |      31.00        |             1            |          31

              |                |                 |                      |             0            |      

   33        |     44       |     51         |     42.67         |             3            |        128

 

IF-THEN … ELSE Command

The RECODE, COMPUTE and SELECT commands may be modified so that they become conditional. That is, the recode, compute or select will be performed or not performed on a given record depending on whether something else is true or false.

The syntax for the IF-THEN modifier is:

 

IF <Statement> THEN RECODE <Variable> <Recode statement>

IF <Statement> THEN COMPUTE <Variable> = <Equation>

IF <Statement> THEN SELECT

 

Note that the portions to the right of the RECODE and COMPUTE commands have syntax identical to the command when there is no IF-THEN modifier. If the <Statement> portion of the command is true for a given record, THEN the record will be recoded, computed or selected. If the <Statement> portion is false, THEN the record will be skipped.

The following example uses three IF-THEN commands to compute a weighted score based on a group number. Because the SAVE command is used, the weighted score could be referenced in subsequent procedures.

 

STUDY SEGMENT

NEW (N10.4) "WS" Weighted Score

IF GROUP = 1 THEN COMPUTE WS = SCORE * 0.4172

IF GROUP = 2 THEN COMPUTE WS = SCORE * 0.8735

IF GROUP = 3 THEN COMPUTE WS = SCORE * 1.0963

SAVE

..

In the example above, GROUP was a numeric variable and it was not necessary to enclose the values in quotes. If GROUP had been an alpha variable, the procedure would have required quotation marks around the group codes.

 

STUDY SEGMENT

NEW (N10.4) "WS" Weighted Score

IF GROUP = "1" THEN COMPUTE WS = SCORE * .4172

IF GROUP = "2" THEN COMPUTE WS = SCORE * .8735

IF GROUP = "3" THEN COMPUTE WS = SCORE * 1.0963

SAVE

..

When performing an IF-THEN-COMPUTE command, the ELSE keyword may be used to specify an alternate computation if the <statement> portion is false. The following example uses different formulas for males and females to calculate a variable called ADJUSTED-SCORE. The spacing for the continuation lines is for readability only.

 

STUDY SCORES

NEW (N4.2) "ADJUSTED_SCORE"

IF SEX = "M" THEN COMPUTE ADJUSTED_SCORE = SCORE * .59

                         ELSE

                         ADJUSTED_SCORE = SCORE * 1.37

SAVE

..

The NEW statement in this example could be eliminated by specifying the variable format as part of the COMPUTE statement.

 

STUDY SCORES

IF SEX = "M" THEN COMPUTE (N4.2) ADJUSTED_SCORE =

       SCORE * .59  ELSE  (N4.2) ADJUSTED_SCORE = SCORE * 1.37

SAVE

..

The following is another example of how quotes are used for referencing alpha-type variables. If the format of SEX was A1, (coded M or F), this procedure would select just the males for a descriptive statistics analysis of AGE:

 

STUDY SURVEY

IF SEX = "M" THEN SELECT

DESCRIPTIVE AGE

..

When you want to reference missing data, use two quote marks (with or without a space between them). For example, if you want to select all non-missing data from the COMMUNITY variable, either of the following commands would produce the desired results regardless of whether COMMUNITY is alpha or numeric format.

 

IF COMMUNITY <> " " THEN SELECT

IF COMMUNITY = "" THEN REJECT

 

The IF-THEN modifier is often used in conjunction with the COMPUTE command to eliminate the possibility of computational errors. For example, let's say we want to compute the square root of PROFITLOSS. Since we cannot take the square root of a negative number, we only want to perform the transformation if the PROFITLOSS variable is positive. In other words, if PROFITLOSS is greater than zero, take the square root; otherwise, skip it.

In our example, the statement would be:

 

IF PROFITLOSS > 0 THEN COMPUTE PROFITLOSS = SQR(PROFITLOSS)

 

Once again, notice that the last part of the statement is identical to the COMPUTE command without the IF-THEN modifier. The only difference is in the IF <Statement> THEN part of the command. The valid relationships supported by StatPac are:

 

=      Equal to

>      Greater than

>=   Greater than or equal to

<      Less than

<=   Less than or equal to

<>   Unequal to

==   Is found within the text string

 

With the exception of the == symbol, all the relational operators are standard algebraic notation. The purpose of the == relational operator is to locate a target string within another string. Its primary use is to search verbatim open-ended responses for selected words or phrases. For example, suppose V1, V2, and V3 were open-ended multiple response, and respondents' verbatim answers were entered into these three fields. You could use the == relational operator to list comments that mentioned the word "hours". The IF-THEN SELECT line says to search variables one through three for the specified text string (HOURS), and select any record that contains the string. Upper and lower case differences will be ignored when using the == operator.

 

COMPUTE (N5) REC = RECORD

IF V1 - V3 == "HOURS" THEN SELECT

LIST REC V1 - V3

OPTIONS MR=(V1 - V3)

..

Often, it may be desirable to search for more that one key word or phrase. The following procedure tells StatPac to search variables one through three for several key words: hours, time, longer, shorter, duration, and length. The LIST command will display the comments that contain any of the search strings.

 

COMPUTE (N5) REC = RECORD

IF V1 - V3 == "HOURS/TIME/LONGER/SHORTER/DURATION/LENGTH/" THEN SELECT

LIST REC V1 - V3

OPTIONS MR=(V1 - V3)

..

It is important to note that the == relational operator works on the sound of the word, and not the exact spelling.  For example, the following procedure might be used to list respondents' positive comments to a multiple response open-ended question.

 

IF ATTITUDE=="Happy/Glad/Satisfied/Pleased" THEN SELECT

LIST ATTITUDE

..

The report might look like this. Notice that StatPac found the word "satsfied" even though it was misspelled.

 

Comments

I am very happy with the current program and I cannot think any changes to make.

I am completely satsfied with the new procedures.

I am glad you finally added an evaluation component.

Finally, a program that holds my interest. I am very pleased.

 

When used in conjunction with other commands, the == relational operator can be used to create a new coded categorical variable from the verbatim text. In a simply example, suppose you asked respondents, "What do you feel is the number one problem in society?"  You are especially interested in responses relating to crime, drugs, violence, and the economy. The following procedure would create a new variable and perform a frequency analysis on it.

 

NEW (N1) "PROBLEM" WHAT IS THE NUMBER ONE PROBLEM IN SOCIETY?

LABELS PROBLEM (1=CRIME)(2=DRUGS)(3=VIOLENCE)(4=ECONOMY)(5=OTHER)

IF V1 <> " " THEN COMPUTE PROBLEM = 5

IF V1 == "CRIME/CRIMINALS/" THEN COMPUTE PROBLEM = 1

IF V1 == "DRUGS/ALCOHOL/COCAINE/NARCOTICS" THEN COMPUTE PROBLEM = 2

IF V1 == "VIOLENCE/VIOLENT/" THEN COMPUTE PROBLEM = 3

IF V1 == "ECONOMY/ECONOMIC/BUDGET/MONEY/" THEN COMPUTE PROBLEM = 4

FREQ PROBLEM

..

StatPac also supports relational operators AND and OR. They may be used in conjunction with the COMPUTE, RECODE and SELECT keywords (with or without parentheses) to make complex IF-THEN statements. The general syntax is:

 

IF <Statement> AND <Statement> THEN RECODE <Variable> <Recode statement>

IF <Statement> OR <Statement> THEN RECODE <Variable> <Recode statement>

 

IF <Statement> AND <Statement> THEN COMPUTE <Variable> = <Equation>

IF <Statement> OR <Statement> THEN COMPUTE <Variable> = <Equation>

 

IF <Statement> AND <Statement> THEN SELECT

IF <Statement> OR <Statement> THEN SELECT

 

For example, let's say we want to compute the following equation:

 

NEWVAR = SQR(INDEX1) + SQR(INDEX2)

 

For this computation to be successful, both INDEX1 and INDEX2 must be greater than zero. (It is not possible to take the square root of a negative number.)

In this example, you could eliminate the possibility of error with the following statement:

 

IF INDEX1 > 0 AND INDEX2 > 0 THEN COMPUTE

        NEWVAR = SQR(INDEX1) + SQR(INDEX2)

 

Because the second line is indented, it is interpreted as a continuation of the previous line. The computation will be performed only if both INDEX1 and INDEX2 are greater than 0.

When using AND relational operators, both statements must be true before the operation (compute, select or recode) will be performed. When using the OR relational operator, if either statement is true, the operation will be performed.

Parentheses may be used in conjunction with AND and OR relational operators to create complex statements. There is no limit on the number of parentheses that may be used in an IF-THEN statement.

Complex weighting schemes can be developed by using combinations of COMPUTE commands. The following procedure creates a weighted subfile where the file is weighted by both age (N2) and sex (A1).

 

STUDY DEMO

NEW (N6.4) "CASE_WT"

IF AGE<21 AND SEX="M" THEN COMPUTE CASE_WT = .9014

IF AGE<21 AND SEX="F" THEN COMPUTE CASE_WT = 1.2037

IF (AGE>=21 AND AGE<41) AND SEX="M" THEN COMPUTE CASE_WT = .4182

IF (AGE>=21 AND AGE<41) AND SEX="F" THEN COMPUTE CASE_WT = .8109

IF AGE>=41 AND SEX="M" THEN COMPUTE CASE_WT = .7892

IF AGE>=41 AND SEX="F" THEN COMPUTE CASE_WT = .8810

WEIGHT CASE_WT

WRITE WT_DEMO

..

StatPac contains a special provision that allows the condensation of several OR relational operators. The condensation may pertain to values or variables.

For example, the following command will select a record if the GROUP variable is 1, 4, 5 or 7. If the GROUP variable is any other value, the record will not be selected.

 

IF GROUP=1 OR GROUP=4 OR GROUP=5 OR GROUP=7 THEN SELECT

 

This same command could be condensed as:

 

IF GROUP="1/4/5/7" THEN SELECT

 

When using this form of the OR relational operator, the values are separated from each other by slashes. Note that whether you are checking alpha or numeric values, you must enclose the list of values in quotation marks to indicate that there is more than one value to be checked.

Similarly, the following command will select a record if any of the three variables are equal to one:

 

IF RESPONSE_1=1 OR RESPONSE_2=1 OR RESPONSE_3=1 THEN SELECT

 

This same command could be condensed as:

 

IF RESPONSE_1-RESPONSE_3=1 THEN SELECT

 

The variables in the variable list portion of the statement can be listed separately, or as a variable range (or combination of the two). The previous command could also be:

 

IF RESPONSE_1 RESPONSE_2 RESPONSE_3 = 1 THEN SELECT

 

SORT Command

 

It is sometimes desirable to sort a data file. This is especially true when you will be listing the file and you want the listing to be in some meaningful order. For example, you may want to sort by ZIP code before printing a name and address file.

There are two other situations where a data file must be sorted: 1) before merging two files by a common variable both files must be sorted by the common variable, and 2) before creating an aggregate file, the file must first be sorted by the aggregate variable.

The syntax for the SORT command is straightforward:

 

SORT (Order) <Variable or variable list>

 

The sort order refers to either ascending or descending order and may be specified as (A) or (D). The sort variable(s) may be alpha or numeric format.

An example of the SORT command would be:

 

STUDY NEIGHBOR

SORT (A) ZIP

LIST NAME ADDRESS CITY STATE ZIP PHONE

..

The result of this procedure is that the data file will be sorted so the lowest zip code is first and the highest zip code is last. Selected variables from the data will then be listed in sorted order. The SORT command only applies to the procedure in which it appears. Subsequent procedures will use the unsorted data unless the SAVE or WRITE commands were also used in the procedure.

The SORT command is always the last keyword that will be executed in a procedure regardless of where it appears in the procedure (i.e., a file is sorted only after all other transformations have been completed). Therefore, if in the same procedure a variable is both sorted and assigned a new value in a COMPUTE statement, the file will be sorted according to the newly computed values, regardless of the order of the SORT and COMPUTE lines in the procedure. If you wish to sort by a variable before it is computed, you must sort the variable in a separate procedure before the procedure that computes the variable.

You can use the SORT command to perform a multidimensional sort by specifying a list of variables to be sorted. The variables should be listed in decreasing order of significance; that is, the first variable in the list is the primary sorting variable, the second variable is used only when two values of the first variable are identical, and so on. For example, if you had a file which contained information about people including their last name (LAST_NAME), first name (FIRST_NAME), and year of birth (BIRTH_YEAR), and you wanted to sort the file according to these three variables, you could use the command:

 

SORT (A) LAST_NAME, FIRST_NAME, BIRTH_YEAR

 

The records in the output file NAMESORT would be ordered alphabetically according to last name. If two or more people in the file had the same last name, their first names would determine who was placed first. If more than one person had the same first and last names, the year of birth would be used to put the records in order. Note that alpha and numeric variables can be combined in the variable list for the SORT command. All variables in the list are sorted in the same order, either ascending or descending.

 

WEIGHT Command

Often, there are known biases in the sample, and the researcher may want to adjust the sample by weighting cases. This will create a data file that compensates for the bias.

Integer case weighting always produces a fixed output. If a case has a weight of two, it will be duplicated twice in the weighted file. If a case has a weight of three, it will be duplicated three times in the output file.

Non-integer case weighting is based on a probability function and will, therefore, produce different results with each run. If a case has a weight of 2.3, it will be duplicated twice in the weighted file, and there is a 30% chance that it will be duplicated a third time. If a case has a weight of .841, there is an 84.1% chance that it will appear in the output file

This command will allow you to use weights that are already contained in the file, or weights may be assigned to each case depending on the value of another variable in the file.

There are two forms of the command syntax. The first is used when there is a numeric variable in the file that already contains the case weight. The syntax for this form of the WEIGHT command is:

 

WEIGHT <Variable>

 

For example, if there were a variable called CASEWEIGHT, the syntax would be:

 

WEIGHT CASEWEIGHT

 

This variable must be numeric and contain the weight of the case. If CASEWEIGHT is missing in any record, the record will be interpreted as if the weight were zero.

If the file does not contain a case weight variable, the other form of the command can be used to assign the weights. The syntax for this form of the command is:

 

WEIGHT <Variable> (<Code>=<Weight>)(<Code>=<Weight>)...

 

For example, to weight the file based on the respondent's sex (SEX), you would weight each case depending on whether it is coded as M or F. In this example, you want to assign a weight of 1.2 to males and 2.4 to females. The syntax to perform this is:

 

WEIGHT SEX (M=1.2)(F=2.4)

 

You should enter a weight for each of the codes that exists in the file. The code (on the left of the equals sign) may be alpha or numeric data, while the weight (on the right of the equals sign) must be numeric. If a code exists in the file that is not reflected in the WEIGHT command, it will be assigned a weight of zero.

Notice that the WEIGHT command can produce a file that contains many more records than the original file. You can control the size of the weighted file by adjusting the values of the weights.

If you want the weighted file to contain approximately the same number of records as the input data file, determine the weight for each code by dividing the desired percentage of records containing the code by the observed percentage of records containing the code.

For instance, suppose you have a survey consisting of 150 respondents (100 males and 50 females), and you want to create a weighted data file with 150 records. Also, you want the new file to contain about the same number of males and females (75 males and 75 females). The weight for the male code would be calculated as .5/.67 or .75. The weight for the female code would be calculated as .5/.33 or 1.5. The command to produce the weighted file would be:

 

WEIGHT SEX (M=.75) (F=1.5)

 

Note that you can also calculate the weights by dividing the desired number of records by the observed number of records for each code. The weight for males (.75) is equivalent to 75/100, and the weight for females (1.5) is equivalent to 75/50.

Because the non-integer portion of the weight is based on a probability function, the output file will usually not contain the exact number of records as the input file.

Complex weighting schemes can be developed by using combinations of COMPUTE commands. The following procedure creates a weighted subfile where the file is weighted by both race and sex.

 

STUDY DEMO

NEW (N6.4) "CASE_WT"

IF RACE="W" AND SEX="M" THEN COMPUTE CASE_WT = .9014

IF RACE="B" AND SEX="M" THEN COMPUTE CASE_WT = 1.2037

IF RACE="O" AND SEX="M" THEN COMPUTE CASE_WT = .4182

IF RACE="W" AND SEX="F" THEN COMPUTE CASE_WT = .8109

IF RACE="B" AND SEX="F" THEN COMPUTE CASE_WT = .9392

IF RACE="O" AND SEX="F" THEN COMPUTE CASE_WT = .8810

WEIGHT CASE_WT

WRITE WT_DEMO

..

 It is especially important to include the WRITE command when you use the WEIGHT command. Since StatPac's weighting is based on a probability function, different sets of weighted data will be created each time you run the procedure. Thus, if your intent is the weight the data, and used the weighted data in a series of subsequent analyses, you should use the WRITE command to create a weighted data file that can be used in the subsequent analyses.

 

STUDY MyStudy

WEIGHT CaseWeight

WRITE MyStudy2

..

STUDY MyStudy2

(the rest of the procedures)

 

NORMALIZE Command

A normalized variable is one in which all the values are expressed in terms of standard deviations (Z scores) rather than as the raw data itself. You can normalize any variable or list of variables with the NORMALIZE command.

The formula for a normalized variable is:

 

Z = (X - XBar) / SD

where:

X is the raw data value

XBar is the mean average

SD is the standard deviation

 

A normalized variable will take on positive and negative values. A positive value of Z indicates that the data is above the mean by Z standard deviations, while a negative value indicates that the data is below the mean by Z standard deviations. The format for the NORMALIZE command is straightforward:

 

NORMALIZE <Variable list>

 

For example, to normalize a SALES variable, we would type:

 

NORMALIZE SALES

 

It is also possible to normalize several variables using the same command. Using a similar example, let's say we want to normalize three variables (SALES, ADVERTISING and DIRECT_MAIL). There are several different ways we could use the NORMALIZE command to normalize all three variables. This involves simply specifying all three variables in the variable list:

 

NORMALIZE V3, V4, V5

NORMALIZE V3-V5

NORMALIZE SALES, ADVERTISING, DIRECT_MAIL

NORMALIZE SALES - DIRECT_MAIL

 

Notice that the only difference between the above commands is the way in which the variable list is specified. The results from each of these would be identical. The variable list may list the variables individually (separated by commas), or by variable range (Low variable - High variable) or by any combination of the two. Either variable numbers or variable names may be used.

Normalized data are non-integer values and contain decimal portions. The number of decimal places is determined by the format of the variable(s) being normalized. Generally, you will not want to normalize the raw data. Instead, create a new variable with the COMPUTE command, save it, and then normalize it in the next procedure. The COMPUTE command will allow you to control the decimal precision of the normalized data.

 

STUDY SALES

COMPUTE (N10.2) "NORM_SALES" = SALES

SAVE

..

NORMALIZE NORM_SALES

SAVE

..

The NORMALIZE command may only be used to normalize an existing variable; it may not be used to normalize a new, computed, recoded, or selected variable. Therefore, if you want to preserve the original data, you will need to run two procedures as illustrated in the previous example.

 

LAG Command

Lagging a variable is often used in simple and multiple regression. When one variable has an effect on another variable, but the effect occurs at a future time, the variable is said to have a lagged effect. A simplified example might be the relationship between our advertising budget and sales. If we double our advertising budget this month, sales will probably increase next month. In other words, advertising budget has a lagged effect on sales. The two variables are related, but one lags behind the other by a specific time period.

The LAG command may be used to lag one or more variables a specified number of time periods. The syntax for the LAG command is:

 

LAG (<Number of lags>) <Variable list>

 

For example, let's say we wanted to lag variable two by three time periods. The LAG command would be:

 

LAG (3) V2

 

In essence, when you lag a variable x times, you are pushing the data down x records for that variable (x refers to the number of lags you specify). The consequence of this action is that the data set becomes longer. The following data set illustrates lagging:

 

Raw data         Lag of one          Lag of two

   4                    Missing                Missing 

   9                        4                       Missing 

 12                        9                            4    

   6                      12                            9    

   2                        6                          12    

                             2                            6    

                                                           2    

 

Using our example where ADVERTISING has a lagged effect on SALES, we could look at the two variables before and after ADVERTISING is lagged:

 

                         BEFORE LAG            AFTER LAG

Record #       Sales   Advertising     Sales   Advertising

1                       25          30                25         Missing

2                       62          40                62          30

3                       80          50                80          40

4                       98          63                98          50

5                                                    Missing      63

 

When you lag a variable in a multiple variable file, the new file will be longer than the original file by the number of lags you specified. The most recent values for the variables that were not lagged will be missing in the new file.

 

DIFFERENCE Command

 

Differencing data is a method for removing trend and/or seasonality. Basically, differencing involves subtracting successive observations from each other. The DIFFERENCE command is easy to use and can take differences from data values one or more time periods apart. To illustrate the concept of differencing, let's look at the following data set. Note that the original data has a well-defined trend (with no irregular values), while the result of the differencing produces a stationary series with no trend.

 

                 Numbers used to compute         Differenced

Raw data             a difference                        data       

  3                           6 - 3                                   3

  6                           9 - 6                                   3

  9                           12 - 9                                 3

  12                         15 - 12                               3

  15                         18 - 15                               3

  18

Also note that differencing has the effect of reducing the number of records by one. Each time you difference your data, the number of records is reduced.

The format for the DIFFERENCE command is:

 

DIFFERENCE (<Periodicity>) <Variable list>

 

In the previous example, if the variable were CASH-ON-HAND, the command would be:

 

DIFFERENCE (1) CASH-ON-HAND

 

If there is more than one variable to difference, simply specify a variable list rather than a single variable name. Use commas or spaces to separate the variable names from one another. The periodicity parameter refers to how many time lags are to be used to calculate the difference. In this example, we are subtracting adjacent values, so the periodicity is one. This is often referred to as a regular or "short" difference because we subtract adjacent values. It has the effect of eliminating trend.

To eliminate seasonality from a data set, do not subtract successive (adjacent) values. Instead, subtract values from the next seasonal period. For example, let's take the following series that has seasonality with a cycle (periodicity) of six periods. That is, the seasonal pattern repeats itself every six periods. In this case, differencing consists of subtracting a value from the corresponding value in the next season. It is known as a seasonal or "long" difference.

 

Example of seasonal differencing (Periodicity = 6)

 

Rec. #     Raw      Numbers used to compute         Differenced

(Time)     data                 a difference                            data                       

 1               3                3 - 3   (T7 - T1)                            0        

 2               4                4 - 4   (T8 - T2)                            0        

 3               5                5 - 5   (T9 - T3)                            0        

 4               4                4 - 4   (T10 - T4)                          0        

 5               3                3 - 3   (T11 - T5)                          0        

 6               2                2 - 2   (T12 - T6)                          0        

 7               3                3 - 3   (T13 - T7)                          0        

 8               4                4 - 4   (T14 - T8)                          0        

 9               5                5 - 5   (T15 - T9)                          0        

10              4                4 - 4   (T16 - T10)                        0        

11              3                3 - 3   (T17 - T11)                        0        

12              2                2 - 2   (T18 - T12)                        0        

13              3                3 - 3   (T19 - T13)                        0        

14              4                4 - 4   (T20 - T14)                        0        

15              5                5 - 5   (T21 - T15)                        0        

16              4                4 - 4   (T22 - T16)                        0        

17              3                3 - 3   (T23 - T17)                        0        

18              2                2 - 2   (T24 - T18)                        0        

19              3

20              4

21              5

22              4

23              3

24              2

 

Note that the result of seasonal differencing on a series that contains no trend or irregular values produces a perfectly stationary series. Differencing will always result in the loss of data. When you difference for seasonality, the amount of data lost will be equal to one seasonal period. In our example, we lost eight data points because the seasonal period was eight. The command syntax is identical to a "short difference" except that the periodicity parameter is greater than one (i.e., equal to the periodicity).

 

DIFFERENCE (8) CASH-ON-HAND

 

Differencing can, therefore, be used to reduce or eliminate both trend and seasonality, depending on the time lag used for differencing. When the lag is one, the effect will be to eliminate trend. When the time lag is equal to the seasonal period, the effect is to eliminate seasonality.

 

DUMMY Command

Dummy variables are used in multiple regression to include nominal or ordinal-type data in the regression equation. Normally, only interval or ratio-type data may be used in multiple regression. Dummy variables may only take on values of one or zero. They may be used as independent variables in multiple regression.

Let's say we have a variable that indicates the presence or absence of a credit history. This variable could be coded as a one (meaning there is a credit history), or zero (meaning there is no credit history). This variable could then be included in a multiple regression problem. It is known as a dummy variable because it represents nominal data.

The situation becomes somewhat more complex when there is more than a simple dichotomy (yes/no). Let's take an example where there are several nominal type categories. Extending the previous example, the information for the variable "CREDIT-HISTORY"  might have been coded:  1=Excellent history  2=Good history  3=Fair history  4=Poor history. This could be expressed with dummy variables by creating four new variables. The first new variable would be "Excellent history" and would be coded as one (yes) or zero (no). The second new variable would be "Good history" and would also be coded as one or zero, and so on. The coding scheme can be illustrated by the following table:

 

Raw     New var 1     New var 2     New var 3    New var 4

Data      Excellent            Good             Fair               Poor       

1                      1                     0                  0                  0

2                      0                     1                  0                  0

3                      0                     0                  1                  0

4                      0                     0                  0                  1

Notice that each of the new dummy variables is assigned a value of one or zero depending on the original data value. Obviously, it would be quite time consuming to re-enter four new values (the four new dummy variables) for each record. The DUMMY command will perform this task for you. The syntax for the command is:

 

DUMMY <Variable>

 

For example, to create dummy variables for the previous example, the command would be:

 

DUMMY CREDIT-HISTORY

 

This would automatically create four new dummy variables and fill in the values with ones and zeros, depending on the data set. A dummy variable will be created for each unique value that exists in the original data file for that variable. Each dummy variable will be a one-digit numeric value. The result of using the DUMMY command will be the creation of new variables that may then be included in a multiple regression.

The new dummy variable names will be the value labels from the study. For instance, if a variable were coded A=English, B=French and C=Spanish, the DUMMY command would create three new variables named ENGLISH, FRENCH and SPANISH. If the variable does not contain value labels, the variable names for the new dummy variables will be "DUMMY-Vx-y", where x is the variable that was used to create the dummy variables and y is the code from the data that defines the group it came from. For example, if variable 32 were used to create dummy variables, and the data file contained codes A, B and C, the variable names will be "DUMMY-V32-A", "DUMMY-V32-B" and "DUMMY-V32-C".

The DUMMY command will only work if the total number of variables (including the new dummy variables) is not greater than fifty. Do not convert interval or ratio data to dummy variables. This is unnecessary and will usually result in exceeding the fifty variable limit.

Note that a dummy variable will be created for each unique value of the original variable. In order to use these dummy variables in multiple regression, it is necessary to call one of them the "standard" and not include it in the regression problem. Using all the new dummy variables in the regression equation will result in a singular matrix, and it will not be possible to perform the matrix inversion necessary for regression. Choose one of the dummy variables, call it the "standard", and exclude it from the multiple regression analysis.

The DUMMY command may only be used on a variable that already exists in the data file. A variable that is dummied cannot be computed, recoded, or selected in the same procedure. If you want to apply the DUMMY command to a transformed variable, you must perform the transformation in a separate procedure before the procedure in which the variable is dummied.

 

RUN Command

The RUN command may be added to the end of any procedure file to enable batch processing of multiple procedure files. It is useful for unattended processing of large jobs, where the size of a procedure file would become excessive. It allows the user to process a series of procedure files in a single batch.

The RUN command may be added as a single line procedure at the end of any procedure file.

The command syntax is:

 

RUN <Procedure file name> <Batch output file name>

 

The <Procedure file name> is the name of the next procedure file to be run. For example, the following procedure file would first run a frequency analysis and a crosstab, and then a new procedure file (PROC2) would be loaded and run.

 

STUDY RESEARCH

FREQ V1

..

CROSSTAB V1 BY V2

..

RUN PROC2

..

 

Multiple procedure files can be run in a batch by adding the RUN command to the end of each procedure file so that at the completion of each procedure file, a new procedure file would be loaded and run. Pre-analysis syntax checking will be performed only on the initial procedure file.

The <Batch output file name> may be optionally specified in the RUN command line to change the batch output file name for the new procedure file. For example, the following command would load the PROC2 procedure file and run it writing the output to FILE2.rtf.

 

RUN PROC2 FILE2

 

If a <Batch output file name> is specified in the RUN command, the new procedure file will write the results to the <Batch output file name>. If page numbering is being used, the new output file will begin with page one, and the table of contents will be appended so that the final table of contents will contain the page numbers for all the procedure files that were run.

 

REM Command

 

The purpose of the REM command is to allow you to imbed notes within a procedure file. Comments are especially helpful when reviewing a procedure file that you have not used for a long time. Comment lines will be ignored when performing an analysis. A comment line begins with an apostrophe, or the word REM. There are no restrictions on the text that may be included in a comment line. Comment lines may also use continuation lines. For example, the following procedure contains two comment lines. The second comment also has a continuation line:

 

REM This procedure has two comment lines

STUDY SURVEY12

' This procedure will only use the first 50 records

  for the analysis because the SELECT command is used

SELECT 1-50

FREQUENCIES ATTITUDE

..

Comment lines can be useful when debugging a procedure that contains an unknown error. By selectively making each line a comment (adding an apostrophe to the beginning of the line), you can essentially eliminate that line as a possible cause of the error.

 

Reserved Words

In addition to keywords, StatPac recognizes several reserved words. Where keywords are always used at the beginning of a line, reserved words are always used somewhere in the middle of a line. Reserved words are specified as a parameter following a keyword or analysis specification command. The reserved words are: RECORD, TIME, TOTAL, MEAN, LO, HI, WITH, BY, THEN, and ELSE. These words have special meaning and should not be used as variable names. These reserved words may be imbedded in command lines as parameters of keywords, but may not be used as keywords themselves.

 

Reserved Word RECORD

The word RECORD is a reserved variable name. It is an implicit name built into the command processor and should not be used as one of your variable names. The RECORD variable will always contain the value of the current record number  (sequence) that is being read from the data file.

The word RECORD may be used to create a time-series regression. For example, the following command would perform a multiple regression analysis using time (RECORD) and BUDGET as the independent variables and SALES as the dependent variable:

 

REGRESS (2) SALES, RECORD, BUDGET

 

If we had collected yearly data but had not entered the year as a variable, we could use the reserved word RECORD to create the year. The following procedure will create a new subfile (NEWFILE) that contains variables one through nine of the INCOME study, and the new variable YEAR as the tenth variable.

 

STUDY INCOME

NEW (N4) "YEAR"

COMPUTE YEAR.0 = RECORD + 1931     (Our data begins in 1932)

WRITE NEWFILE  V1-V9 YEAR

..

Important User Tip

There is one caution that should be observed when using RECORD. If the SELECT or SORT commands are used, the record numbers will change so that the record numbers reflect the selected or sorted records. For example, suppose we wanted to list the record numbers of all the cases where AGE is greater than sixty. The following procedure is wrong.

 

STUDY RETIRE

IF AGE > 60 THEN SELECT

LIST RECORD

..

Instead, you must first compute the record number, and then reference the computed variable to display the record number. The following procedure is correct.

 

STUDY RETIRE

COMPUTE (N5) REC_NUM = RECORD

IF AGE > 60 THEN SELECT

LIST REC_NUM

..

 

Reserved Word TOTAL

The reserved word TOTAL is used in banners to specify row and column totals in the table. Its use is described in detail under the "Row Totals and Column Totals" section of the BANNERS command documentation.

 

Reserved Word MEAN

The reserved word MEAN is used in banners to specify row and column mean averages in the table. Its use is described in detail under the "Means and Standard Deviations" section of the BANNERS command documentation.

 

Reserved Word TIME

The reserved word TIME will be used in a future version of StatPac for Windows.