StatPac for Windows User's Guide
StatPac Home
 

Overview

System Requirements and Installation

System Requirements

Installation

Unregistering & Removing the Software from a PC

Network Operation

Updating to a More Recent Version

Backing-Up a Study

Processing Time

Server Demands and Security

Technical Support

Notice of Liability

Paper & Pencil and CATI Survey Process

Internet Survey Process

Basic File Types

Codebooks (.cod)

Data Manager Forms (.frm)

Data Files (.dat)

Internet Response Files (.asc or .txt)

Email Address Lists (.lst or .txt)

Email Logs (.log)

Rich Text Files (.rtf)

HTML Files (.htm)

Perl Script (.pl)

Password Files (.text)

Exported Data Files (.txt and .csv and .mdb)

Email Body Files (.txt or .htm)

Sample File Naming Scheme for a Survey

Customizing the Package

Problem Recognition and Definition

Creating the Research Design

Methods of Research

Sampling

Data Collection

Reporting the Results

Validity

Reliability

Systematic and Random Error

Formulating Hypotheses from Research Questions

Type I and Type II Errors

Types of Data

Significance

One-Tailed and Two-Tailed Tests

Procedure for Significance Testing

Bonferroni's Theorem

Central Tendency

Variability

Standard Error of the Mean

Inferences with Small Sample Sizes

Degrees of Freedom

Components of a Study Design

Elements of a Variable

Variable Format

Variable Name

Variable Label

Value Labels

Valid Codes

Skip Codes for Branching

Data Entry Control Parameters

Missing OK

Auto Advance

Caps Only

Codebook Tools

The Grid

Codebook Libraries

Duplicating Variables

Insert & Delete Variables

Move Variables

Starting Columns

Print a Codebook

Variable Detail Window

Codebook Creation Process

Method 1 - Create a Codebook from Scratch

Method 2 Create a Codebook from a Word-Processed Document

Spell Check a Codebook

Multiple Response Variables

Missing Data

Changing Information in a Codebook

Overview

Data Input Fields

Form Naming Conventions

Form Creation Process

Using the Codebook to Create a Form

Using a Word-Processed Document to Create a Form

Variable Text Formatting

Field Placement

Value Labels

Variable Separation

Variable Label Indent

Value Labels Indent

Space between Columns

Valid Codes

Skip Codes

Variable Numbers

Variable List and Detail Windows

Data Input Settings

Select a Specific Variable

Finding Text in the Form

Replacing Text in the Form

Saving the Codebook or Workspace

Overview

Keyboard And Mouse Functions

Create A New Data File

Edit Or Add To An Existing Data File

Select A Different Data File

Change Fields

Change Records

Enter A New Data Record

View Data For A Specified Record Number

Find Records That Contain Specified Data

Duplicate A Field From The Previous Record

Delete A Record

Data Input Settings

Compact Data File

Double Entry Verification

Print A Data Record

Variable List & Detail Windows

Data File Format

Overview

HTML Email Surveys

Plain Text Email Surveys

Brackets

Item Numbering

Codebook Design for a Plain Text Email Survey

Capturing a Respondent's Email Address

Filtering Email to a Mailbox

General Considerations for Plain Text Email

Overview

Internet Survey Process

Server Setup

Create the HTML Survey Pages

Upload the Files to the Web server

Test the survey

Download and import the test data

Delete the test data from the server

Conduct the survey

Download and import the data

Display a survey closed message

Server Setup

FTP Login Information

Paths & Folder Information

Design Considerations for Internet Surveys

Special Variables for Internet Surveys

Script to Create the HTML

Command Syntax & Help

Saving and Loading Styles

Survey Generation Procedure

Script Editor

Imbedded HTML Tags

Primary Settings

HTML Name (HTMLName=)

Banner Image(s)  (BannerImage=)

Heading  (Heading=)

Finish Text & Finish URL (FinishText= and FinishURL=)

Cookie (Cookie=)

IP Control (IPControl=)

Allow Cross Site (AllowCrossSite=)

URL to Survey Folder  (WebFolderURL=)

Advanced Settings - Header & Footer

RepeatBannerImage

RepeatHeading

PageNumbers

ContinueButtonText

SubmitButtonText

ProgressBar

FootnoteText & FootnoteURL

Advanced Settings - Finish & Popups

Thanks

Closed

HelpWindowWidth & HelpWindowHeight

HelpLinkText

LinkText

PopupBannerImage

PopupFullScreen

Advanced Settings - Control

Method

Email

RestartSeconds

MaximizeWindow

BreakFrame

AutoAdvance

BranchDelay

Cache

Index

ForceLoaderSubmit

ExtraTallBlankLine

RadioTextPosition

TextBoxTextPosition

LargeTextBoxPosition

LargeTextBoxProgressBar

Advanced Settings - Fonts & Colors

Global Attributes

Heading, Title, Text, & Footnote Attributes

Instructions, Question, and Response Attributes

Advanced Settings - Passwords - Color & Banner Image

LoginBannerImage

LoginBGColor

LoginWallpaper

LoginWindowColor

Advanced Settings - Passwords - Text & Control

PasswordType

LoginText

PasswordText

LoginButtonText

FailText

FailButtonText

ShowLink

EmailMe

KeepLog

Advanced Settings - Passwords - Single vs. Multiple

Password (single password method)

PasswordFile (multiple passwords method)

PasswordField & ID Field (multiple passwords method)

PasswordControl

Advanced Settings - Passwords - Technical Notes

Advanced Settings - Server Overrides

ActionTag

StorageFolder

ScriptFolder

Perl

MailProgram

Branching and Piping

Randomization (Rotations)

Survey Creation Script - Overview

Using Commands More than Once in a Script

Survey Creation - Specify Text

Heading

Title

Text

FootnoteText

Instructions

Question

Survey Creation - Spacing and pagination

BlankLine

NewPage

Survey Creation - Images and Links

Image

Link

Survey Creation - Help Windows

Survey Creation - Popup Windows

Survey Creation - Objects

Radio Buttons for a Single Variable

Radio Buttons for Grouped Variables (matrix style)

DropDown Menu

TextBox for a Single Variable

Adding a TextBox to a Radio Button,
    CheckBox, or Radio Button Matrix

TextBoxes for Grouped Variables

Sliders for Single or Grouped Variables

CheckBox for Multiple Response Variables

ListBox

Uploading and Downloading Files from the Server

Auto Transfer

FTP

Summary of the Most Common Script Commands

Overview

Format of an Email Address File

Extract Email Addresses

List Statistics

Join Two or More Lists

Split a List

Clean, Sort, and Eliminate Duplicates

Add ID Numbers to a List

Create a List of Nonresponders

Subtract One List From Another List

Merge an Email List into a StatPac Data File

Send Email Invitations

Using an ID Number to Track Responses

Email Address File

Body Text File

Sending Email

Overview

Mouse and Keyboard Functions

Designing Analyses

Continuation Lines

Comment Lines

V Numbers

Keywords

Analyses

Variable List

Variable Detail

Find Text

Replace Text

Options

Load, Save, and Merge Procedure Files

Print a Procedure File

Run a Procedure File

Results Editor

Graphics

Table of Contents

Automatically Generate Topline Procedures

Keyword Index

Keywords Overview

Categories of Keywords

Keyword Help

Ordering Keywords

Global and Temporary Keywords

Permanently Change a Codebook and Data File

Backup a Study

STUDY Command

DATA Command

SAVE Command

WRITE Command

MERGE Command

HEADING Command

TITLE Command

FOOTNOTE Command

LABELS Command

OPTIONS Command

SELECT and REJECT Commands

NEW Command

LET Command

STACK Command

RECODE Command

COMPUTE Command

AVERAGE, COUNT and SUM Commands

IF-THEN ELSE Command

SORT Command

WEIGHT Command

NORMALIZE Command

LAG Command

DIFFERENCE Command

DUMMY Command

RUN Command

REM Command

Reserved Words

Reserved Word RECORD

Reserved Word TOTAL

Reserved Word MEAN

Reserved Word TIME

Analyses Index

Analyses Overview

LIST Command

FREQUENCIES Command

CROSSTABS Command

BANNERS Command

DESCRIPTIVE Command

BREAKDOWN Command

TTEST Command

CORRELATE Command

Advanced Analyses Index

REGRESS Command

STEPWISE Command

LOGIT and PROBIT Commands

PCA Command

FACTOR Command

CLUSTER Command

DISCRIMINANT Command

ANOVA Command

CANONICAL Command

MAP Command

Advanced Analyses Bibliography

Utility Programs

Import and Export

StatPac and Prior Versions of StatPac Gold

Access and Excel

Comma Delimited and Tab Delimited Files

Files Containing Multiple Data Records per Case

Internet Files

Email Surveys

Merging Data Files

Concatenate Data Files

Merge Variables and Data

Aggregate

Codebook

Quick Codebook Creation

Check Codebook and Data

Sampling

Random Number Table

Random Digit Dialing Table

Select Random Records from Data File

Compare Data Files

Conversions

Date Conversions

Currency Conversion

Dichotomous Multiple Response
   Conversion

Statistics Calculator Menu

Distributions Menu

Normal distribution

T distribution

F distribution

Chi-square distribution

Counts Menu

Chi-square test

Fisher's Exact Test

Binomial Test

Poisson Distribution Events Test

Percents Menu

Choosing the Proper Test

One Sample t-Test between Percents

Two Sample t-Test between Percents

Confidence Intervals around a Percent

Means Menu

Mean and Standard Deviation of a Sample

Matched Pairs t-Test between Means

Independent Groups t-Test between Means

Confidence Interval around a Mean

Compare a Sample Mean to a Population Mean

Compare Two Standard Deviations

Compare Three or more Means

Correlation Menu

Sampling Menu

Sample Size for Percents

Sample Size for Means

Basic Statistical Analyses

Analyses Index

These commands may be used in a procedure to set the type of analysis to be performed.

 

Banners

Breakdown

Correlate

Crosstabs

Descriptives

Frequencies

List

Ttest

 

These additional commands may be used in a procedure when the Advanced Analyses module has been installed.

 

ANOVA

Canonical

Cluster

Discriminant

Factor

Logit

Map

PCA

Probit

Regress

Stepwise

 

 

Analyses Overview

 

There are many different types of analyses that can be performed with StatPac. Most commands are easy to use since much of the required analysis information comes from the default parameter table.

With the exception of the OPTIONS command all other analysis commands are mutually exclusive in any given procedure. In other words, a single procedure cannot perform more than one kind of analysis. A procedure file, however, may contain many procedures, each performing a different kind of analysis.

The OPTIONS command may be used in any procedure to override the default values in the parameter table. It is used to control printing and analysis parameters.

The analysis commands available in StatPac's basic package are: LIST, FREQUENCIES, DESCRIPTIVE, BREAKDOWN, CROSSTABS, BANNERS, TTEST, and CORRELATE.

The analysis commands available in the StatPac Advanced Statistics module are: REGRESS, DISCRIMINANT, PCA, FACTOR, ANOVA, CANONICAL, CLUSTER, and MAP.

Important User Tip

Any of the analysis commands may be abbreviated by using only the first two characters of the keyword. For example, FREQUENCIES could be abbreviated to FR, FACTOR could be abbreviated as FA, and OPTIONS could be abbreviated as OP.

 

LIST Command

 

The LIST command is used to list selected variables in the data file. The command syntax is:

 

LIST <Variable list>

 

If the LIST command does not specify a variable or variable list, all variables will be listed. When used in this fashion, value labels will be listed instead of the raw data.

For example, let's say you want to print a report consisting of only two columns. The first column is variable 7 (AGE) and the second column is variable 14 (SEX). Either variable numbers or names may be used to specify the variable list. The command line would be entered as either:

 

LIST V7 V14

LI AGE SEX    ( LIST may be abbreviated as LI)

 

The keyword RECORD may be used as part of the variable list to print the record number as one of the columns. For example, the following parameter line will produce a report consisting of four columns, the first column being the sequence of the case in the data file:

 

LIST RECORD V12 V31 V83

 

You may specify as many variables to be included in the report that can be accommodated by the pitch and orientation of the output. If too many variables are specified, the output will be truncated. Missing data will be displayed as a series of dashes.

The LIST command is often used as a way to trouble-shoot a procedure that is not working. For example, if the following procedure didn't work properly, we might try the LIST command to figure out what went wrong:

 

STUDY EXAMPLE

COMPUTE (N4.1) AVG = V1 * V2 / 2

DESCRIPTIVE AVG

..

We could replace the DESCRIPTIVE command with the LIST command and list the relevant variables. Also note that we added the SELECT command to limit the printout to the first twenty-five records (e.g., we don't need to list the whole file to find out why it is not working).

 

STUDY EXAMPLE

COMPUTE (N4.1) AVG = V1 * V2 / 2

SELECT 1-25

LIST RECORD V1 V2 AVG

..

 

Example of a List Printout

 

To list an open-ended variable, simply specify it in the LIST command.  The following would list a variable called "Comment". The COMPUTE line is used to calculate a record number so it can be included in the printout. The IF-THEN-SELECT line is used to select only those who made a comment. The OPTIONS line is used insert a blank line between each response.

 

COMPUTE (N5) REC=RECORD

IF Comment <> " " THEN SELECT

LIST Rec Comment

OPTIONS BL=Y

..

The output might look like this:

 

Example of a Verbatim Listing

 

Multiple Response & Combining Variables

There are two options (MR and CB) to control the way that data gets displayed with the LIST command.

The MR option is used to specify variables you want to be stacked on top of each other in a single column. The specified variables will be listed in a single column rather than using multiple columns on the listing. The variables do not have to be true multiple response variables in the codebook; you may use the MR option for any variables.

The CB option is used to specify variables that you want to combine into a single field instead of being treated as individual fields. For example, if City, State, and Zip were separate variables, you could display them together using the CB option.

Normally, all variables in a listing would appear side-by-side.  The MR and CB options are used to create an easier to read format.

For example, suppose variables six, seven, and eight are being used to hold respondents' verbatim answers to a question on a restaurant survey:  "What three things could we do to improve your dining experience?"   Three A70 variables were used. The following command would produce a listing of the data in a vertical format. Up to three lines in the report would be displayed for each respondent. The SELECT command is used to eliminate subjects who did not answer the question.

 

IF V6 <> " " THEN SELECT

LIST V6-V8

OPTIONS MR=(V6-V8)

..

An example of a single record in the printout might look like this:

 

Faster service.

Reduce prices.

Greater selection.

 

If the CB option were used instead of the MR option, all three responses would be combined into a single field (giving the appearance that the three responses we part of the same sentence or paragraph.

 

IF V6 <> " " THEN SELECT

LIST V6-V8

OPTIONS CB=(V6-V8)

..

 

The CB option formats the printout so each record in the listing will use the number of lines that it needs to show the data. The listing might appear as follows:

 

Faster service. Reduce prices. Greater selection.

 

The MR and CB options may be used in conjunction with each other to produce desired outputs. For the next example, assume the following variables:

 

V1     Name

V2     Street_Address

V3     City

V4     State

V5     Zip

V6     Phone_Number

V7     Fax_Number

V8     Email_Address

 

We might want to stack Name, Address, City, State, and Zip into a single column on the printout. We might also want to stack the Phone and Fax numbers into a single column. In the following procedure, the CB option is used to combine City, State, and Zip into a single field, and the MR option is used to specify which variables should be displayed in a vertical column.

 

LIST V1-V8

OPTIONS CB=(V3-V5)  MR=(V1-V5)(V6-V7)

..

 

The output might look like this:

 

Example of a Listing Using the MR and CB Options

 

Labeling and Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).  Also, LB=0 suppresses labeling, and LB=X suppresses all labeling and page feeds.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns of the listing.

Maximum Width

MW

Sets the maximum width (in inches) that will be used for long alpha variables and multiple response variables.

Blank Line Between Rows

BL

When BL=Y, a blank line will be printed between each row of the listing. When BL=N, no blank line will be printed.

Maximum Pages

MP

The MP option may be set to the maximum number of pages that will be printed. Its purpose is to prevent an unintentional listing of hundreds or even thousands of pages. If MP=0, then the listing will become as long as necessary to print all the output. If MP is set to any other number, that will become the maximum number of pages that will be printed.

 

 

FREQUENCIES Command

A frequency analysis is the simplest of all statistical procedures. It is ideal for data which has been coded into groups or categories. The coding can be either alpha or numeric-type data.

The syntax of the command to run a frequency analysis is:

 

FREQUENCIES <Variable list>

 

For example, to find the percent of males and females in a sample, you would request a single analysis:

 

FR  SEX   ( FREQUENCIES may be abbreviated as FR)

 

Several frequency analyses can be requested with a single command. For example, to get a frequency analysis of SEX (V4), RACE (V5) and INCOME (V6), the request could be specified in several ways:

 

FREQUENCIES SEX, RACE, INCOME

FREQUENCIES SEX RACE INCOME

FREQUENCIES V4 V5 V6

FREQUENCIES V4-V6

 

Notice that either the variable name or the variable number may be specified as part of the variable list.

A frequency analysis may be run on alpha or numeric-type variables. Missing data will be included in the frequencies only if there is a value label for missing data, (e.g., <BLANK>=No response).

Table Format

Three types of printout formats are built into the program: expanded, condensed and automatic. The option to control the table format is:

 

OPTIONS TF=N         (No table will be printed)

OPTIONS TF=A         (Formatting will be automatic)

OPTIONS TF=E         (Formatting will be expanded)

OPTIONS TF=C         (Formatting will be condensed)

 

Condensed formatting is especially useful when there are many unlabeled values. For example, if one of the variables is ID NUMBER, there are generally no value labels associated with this variable. It is often a good idea to check the data to be sure that no records were inadvertently entered twice (i.e. duplicate ID numbers). A condensed frequencies printout would allow you to quickly determine if any ID NUMBER is specified more than once. An example of condensed formatting might look like this:


Example of a Compressed Frequencies Printout

 

Automatic formatting is generally recommended since it minimizes the amount of paper that will be used. If automatic formatting is used and there are more than 50 unlabeled categories (no value labels), the printout will automatically be converted to condensed format. In most cases, this will result in the expanded format. An example of expanded formatting might look like this:

 

Example of an Expanded Frequencies Printout

 

Print Zero Values

Sometimes there may be a category listed in the value labels that has no accompanying data. For example, nobody in the sample may be over 40 years old or make over $30,000 a year. Whether or not you want the label to appear with a count of zero is a matter of preference. If you want the reader of your report to know that a category was available, you'd probably want to print zero values (ZV=Y). If you are interested in saving space, you might want to exclude zero values (ZV=N).

Sort Type & Sort Order

Frequency analyses are often more meaningful when the output is displayed in sorted order. When working with nominal-type data and few categories, the order in which categories are presented is not very important. (e.g. It really doesn't make much difference whether males or females are listed first.)  However, as the number of categories increases, it may be desirable to list those with the highest count first, followed by those with lower counts. This would be a sort by frequency of response in descending order. It would be requested with the following options:

 

OPTIONS ST=F SO=D     (Sort Type by frequency of response)

                                           (Sort Order is descending)

 

When data is ordinal, it is more appropriate to present the output in order defined by the categories themselves. Usually this is the same as the alpha or numeric code used to represent a category. For example, take the following two survey questions:

 

How old are you?                What is your annual income?

    A=Under 21                               1=Under $10,000

    B=21-30                                    2=$10,000-$20,000

    C=31-40                                    3=$21,000-$30,000

    D=Over 40                                4=Over $30,000

 

Both questions are ordinal; the first one is coded alpha and the second is numeric. It would be desirable to have the frequencies printout appear in ascending order by the code (the same way they are listed above). The options statement to do this is:

 

OPTIONS ST=C SO=A          (Sort Type is by category code)

                                                   (Sort Order is ascending)

 

Notice that this type of sort is generally the way the information would be specified in the value labels. If this is the case, sorting by category code will have no effect. Sorting by category codes is useful if you did not enter value labels for the variable.

If no sort type is specified (ST=N), the output will be displayed in the same order as specified by the value labels. If the value labels do not contain all the values in the data file (such as mispunched data), the unlabeled values will appear on the printout in the order that they are encountered in the data file.

Additionally, a digit may be added as a suffix to the SO=A or SO=D. It is used to sort the value labels excluding the last one or more value labels. This is useful when the last value label is an "other" category, and you want to sort the value labels, but still leave the "other " as the last row in the report. For example ST=F SO=D1 would sort the value labels in descending order by frequency, except it would leave the last value label as the last row regardless of its frequency.

Truncate Labels

Very long value labels may sometimes exceed the space allocated for them in the printout. In those situations, you may set the program to either truncate the value labels (TL=Y excludes the ending portion of the label), or to use multiple lines to print the entire value label (TL=N).

Cumulative Percents

When the frequency table is printed in expanded format, you may print or exclude cumulative percents with the CP option. This would be specified as:

 

OPTIONS CP=Y       (Turn on cumulative percents)

OPTIONS CP=N       (Turn off cumulative percents)

Confidence Intervals

Confidence intervals for proportions can be requested with the CI option. For example, to request the 95% confidence intervals, you would use the option CI=95, and the 99% confidence intervals could be requested by CI=99. Confidence intervals allow us to estimate the proportions in the population for each of the response categories. If repeated samples are taken from the population, we would expect the category proportions to fall within the confidence intervals. When confidence intervals are requested, cumulative percents will not be printed regardless of the setting of the CP option.

Confidence intervals are calculated by first computing the estimated standard error of the proportion, and then using the t distribution to find the actual interval. Note that the finite population correction factor (1-n/N) is used to adjust the standard error if the sample represents a large proportion (say greater than ten percent) of the population. When the sample is large, use the FP options to specify the population size (i.e., FP=x, where x is the size of the population). If the FP option is not specified, no correction will be applied.

 

Example of an Confidence Intervals Around A Percent

 

Critical T Probability

After performing a frequency analysis, researchers are often interested in determining if there is a significant difference between the various categories. The Chi-square statistic is often used to determine if the observed frequencies markedly differ from the expected frequencies. The problem with the Chi-square statistic is that it does not isolate the significant differences (i.e., it only tells whether or not one exists). StatPac uses a t-test to compare all possible pairs of categories to determine where the actual differences lie.

The CT option may be set between 0 and 1. When CT=0, no t-tests will be performed or printed. If CT=1, the t-statistic and probability will be printed for all possible pairs of categories. A typical setting for the critical t probability is 5% (CT=.05). In this case, StatPac will print the t-statistic and two-tailed probability for all pairs of categories that have a probability of p=.05 or less. StatPac uses the following formula to calculate the t-statistic:

 

Example of a Critical T Probability Analysis

Percentage Base

The percentage base on a frequency analysis can either be the number of respondents (N) or the total number of responses. If PB=N, the denominator for calculating percentages will be the number of respondents. If PB=R, the denominator will be the total number of responses for all individuals.

Multiple Response

Surveys often include questions in which the respondent is asked to make more than one response to a single question. An example of the kind of question that is appropriate for multiple variable response is:

 

1. Which of the following services did you use?

     (Check all that apply)

            __ Counseling

            __ Job placement

            __ Remedial reading

            __ Remedial math

            __ Resume writing

 

The multiple response frequency analysis is used to summarize these kinds of items.  When designing a study that includes this type of question, each choice is considered as a separate variable. The value labels need to be specified only for the first variable, but it is fine if they are specified for all the multiple response variables.

 

V1  "Services_1" Services Used

        1=Counseling

        2=Job placement

        3=Remedial reading

        4=Remedial math

        5=Resume writing

 

V2  "Services_2" Services Used

 

V3  "Services_3" Services Used

 

V4  "Services_4" Services Used

 

V5  "Services_5" Services Used

 

The syntax for the multiple response frequency analysis is:

 

FREQUENCIES <Variable list>

OPTIONS MR=Y

 

In this example, all the variables in the variable list will be treated as one multiple response variable. Another way to use the MR option is to re-specify the variable numbers (not the variable names) that should be grouped.

 

FREQUENCIES <Variable list>

OPTIONS MR=(<Variable list>)

 

Note that the parentheses are required around the variable list in the options line. In the above example, the commands would be:

 

FREQUENCIES V1-V5

OPTIONS MR=(V1-V5)

 

The output will contain the counts and percents for each of the response values. That is, how many times code 1 (counseling) was chosen for any variable, how many times code 2 (job placement) was chosen for any variable, etc. In other words, it will print the total number of times that each response was recorded for variables 1, 2, 3, 4 and 5 combined.

The options line may be used to specify several multiple response analyses by using additional sets of parentheses in the MR option. The following commands would perform three different tasks (each one being a multiple response analysis on a new set of variables).

 

FREQUENCIES V1-V20

OPTIONS MR=(V1-V10)(V11-V15)(V16-V20)

 

Multiple response may also be used when the questionnaire limits the choices to less than the number of possible responses. For example, the following question asks for two responses from the same value labels list:

 

17 & 18. Write the numbers of your two favorite foods from the list below.

        _____  _____

        1 = Hotdogs

        2 = Hamburgers

        3 = Fish

        4 = Roast Beef

        5 = Chicken

        6 = Salad

 

Notice that there are two variables (17 & 18) that hold the information for this question. Both variables use the same value labels and the responses to both variables are weighted equally (i.e. the first one is not more important than the second). Multiple response assumes that all variables to be analyzed have the same value labels. In this example, the command would be:

 

FREQUENCIES V17 V18

OPTIONS MR=(V17 V18)

 

Example of a Multiple Response Frequency Analysis

 

Category Creation

The actual categories in the frequency analysis can be created either from the study design value labels (CC=L) or from the data itself (CC=D). When the categories are created from the labels, the value labels themselves will be used to create the categories for the analysis, and data that does not match up with a value label code will be counted as missing. That is, mispunched data will be counted as missing. When categories are created from the data, all data will be considered valid whether or not there is a value label for it.

One Analysis

The one-analysis option allows you to print frequency analyses for several variables on one page. This option is especially useful for management reporting when the information needs to be condensed and concise.

All the variables specified with the OA option must have the same value labels. An example might be a series of Yes/No questions or Likert scale items. The important point is that each variable has exactly the same value labels as the other variables. For example, suppose that variables 21-30 are ten items asking the respondents to rate the item as low, medium or high. The following commands would produce a one page summary of all ten items:

 

FREQUENCIES V21-V30

OPTIONS OA=Y

 

The one analysis option is limited by the number of characters that can be printed on a line (i.e., by the pitch and carriage width of the printer). If there are too many different value labels, they will not be able to fit on one line and the analysis will be skipped. If this should happen, try rerunning the analysis using a compressed pitch. As a general rule, each value label will require ten spaces on the output.

 

Example of a One-Analysis Printout

 

The OA option is used in frequency analyses to summarize the frequencies of several variables that all contain the same value labels. Note the difference between the OA and MR options. With the multiple response option (MR), the items are treated as if they are a single variable. The one analysis option (OA), however, treats each item as a separate analysis. The results, however,  will be summarized on one page.

When the MR option is used in conjunction with the OA option, the variables in the MR options list will be treated as multiple response variables. This makes it easy to create nets in a frequencies with the OA=Y option.

For example, if V1-V20 are the twenty variables, we could add a net by first creating a duplicate copy of V1 with a new name, and then including the MR option to combine the variables to make the net. The net will be the sum of the counts of the individual variables that make up the MR variable list.

 

STUDY Yourstudy

NEW (N1) "Grand-Total"

COMPUTE Grand-Total = V1

LABELS Grand-Total (1=Agree)(2=Neutral)(3=Disagree)

FREQ Grand-Total V2-V20 V1-V20

OPTIONS OA=Y MR=(Grand-Total V2-V20)

..

 

The results might look like this:

 

                            Agree       Neutral       Disagree

Grand-Total        --------        --------        --------

Variable 1           --------        --------        --------

Variable 2           --------        --------        --------

Variable 3           --------        --------        --------

etc.

 

The following is another example shows how you can use MR option in conjunction with the OA option to create complex nets. It also shows how the reserved word "RECORD" can be used to create blank lines in the report.

Suppose we are conducting of survey of government policies. We have nine "Agree/Disagree" items coded as 1=Agree and 2=Disagree. The first three items deal with "Social Policy"; the next three items with "Foreign Policy"; and the last three items with "Fiscal Policy". We would like to produce a report that looks something like this:

 

Peoples Attitudes Towards Government Policies

 

 

(N=x)                              Agree             Disagree

 

OVERALL                    -----           -----

 

SOCIAL POLICY          -----           -----

   Item 1                           -----           -----

   Item 2                           -----           -----

   Item 3                           -----           -----

 

FOREIGN POLICY       -----           -----

   Item 4                           -----           -----

   Item 5                           -----           -----

   Item 6                           -----           -----

 

FISCAL POLICY           -----           -----

   Item 7                           -----           -----

   Item 8                           -----           -----

   Item 9                           -----           -----

There are four different nets in this report. The OVERALL net includes all variables. The SOCIAL POLICY net includes the first three items, the FOREIGN POLICY net the next three items, and the FISCAL POLICY net the last three items. For this example Items 1-9 are stored in variables 1 to 9.

The spacing (indentation) in this example is used only to make the procedure easier to understand. It is not necessary to use the this type of spacing in your procedures.

 

STUDY Yourstudy

HEADING Peoples Attitudes Towards Government Policies

COMPUTE (N1) OVERALL=V1

COMPUTE (N1) SOCIAL POLICY=V1

COMPUTE (N1) FOREIGN POLICY=V4

COMPUTE (N1) FISCAL POLICY=V7

LABELS OVERALL

       SOCIAL POLICY

       FOREIGN POLICY

       FISCAL POLICY

       (1=Agree)(2=Disagree)

FREQ OVERALL V2-V9                Produces overall net

     RECORD                                     Produces a blank line

     SOCIAL POLICY V2-V3           Produces social policy net

     V1-V3                                          Produces 3 social policy variables

     RECORD                                     Produces a blank line

     FOREIGN POLICY V5-V6        Produces foreign policy net

     V4-V6                                          Produces 3 foreign policy variables

     RECORD                                     Produces a blank line

     FISCAL POLICY V8-V9            Produces fiscal policy net

     V7-V9                                          Produces 3 fiscal policy variables

OPTIONS SV=N OA=Y

            MR=(OVERALL V2-V9)

                     (SOCIAL POLICY V2-V3)

                     (FOREIGN POLICY V5-V6)

                     (FISCAL POLICY V8-V9)

..

Special Value Label HIDE

When performing a frequencies with the OA option, it is often desirable to only display some of the response categories. Recoding undesirable categories to missing is one method to exclude it from the table. This will eliminate the column from the table and from any calculations of percentages on the table.

For example, assume the following counts for V1:

 

  1                     2                        3                          

Agree           Neutral             Disagree         No Response      Total N

 30                   20                     40                    10                     100

 

If PB=N, (denominator equals number of respondents), the percents will be:

 

Agree             Neutral            Disagree

  30%                20%                 40%

 

If PB=R, (denominator equals number of responses), the percents will be:

 

Agree              Neutral             Disagree

30/90=33%     20/90=22%       40/90=44%

 

We could use the following RECODE command to eliminate the "Neutral" category from the table:

 

RECODE V1 (2= )

 

If PB=N, the percents will still be based on a denominator of 100. If however, PB=R, then the percents will be based on a denominator of 70 (30+40):

 

Agree                 Disagree

30/70=43%        40/70=57%

 

The special value label "HIDE" may be used to suppress printing of a value label without reducing the denominator for the percents calculations. The following LABELS command could be used to eliminate the "Neutral" category from the table, while still including the "Neutral" count in the denominator:

 

LABELS V1 (1=Agree)(2=Hide)(3=Disagree)

 

Any row or column that has a value label of "HIDE" will not be printed, but it will be included in the percent calculations when PB=R. Note that the percentages are based on the counts for all value labels (including the "Neutral" category), even though all the value labels are not displayed in the table.

 

Agree                     Disagree

30/90=33%           40/90=44%

 

If you only wanted the "Agree's" to show in the table, you could use the following statements in the procedure. The percentages in the table would still be based on 90:

 

LABELS V1 (1=Agree)(2=Hide)(3=Hide)

OPTIONS PB=R

Print Format

The results from the one analysis option may be printed as row percents (PF=R), as counts (PF=N), or both (PF=NR). When row percents are requested, the denominator used to calculate the percents will be the number of non-missing responses for that particular item. That is, when there is missing data, the number of valid responses to a particular question may be different than the number of valid responses for any of the other questions.

Print Total

The PT option may be used in conjunction with the OA (one-analysis) option to print the total N for each variable. When there is considerable missing data, this option is highly recommended since each of the variables may be using a different N (number of valid responses). For example, the following commands would produce a one-page report summarizing variables 21 to 30. An additional column will be included on the output that lists the number of valid cases for each of the variables.

 

FREQUENCIES V21-V30

OPTIONS OA=Y PT=Y

Sort Variables

When performing a frequency analysis with the OA=Y option, you can sort the variables by the contents of the first column of the results. The SV (sort variables) option may be set to "N" for no sort, "A" to sort in ascending order, or "D" to sort in descending order. When no sort is specified, the variables will be listed in the order that they appear in the analysis command variable list. The SV option is applicable only when the OA=Y option is specified. However, if the MR option is also specified, the SV option should be set to N.

Additionally, a digit may be added as a suffix to the SV=A or SV=D. It is used to sort the variables excluding the last one or more variables when the OA=Y option is specified. This is useful when the last variable is an "other" variable, and you want to sort the variables, but still leave the "other " as the last variable. For example SV=D1 would sort the variables in descending order, except it would leave the last variable as the last row regardless of its value.

Supplemental Heading

The supplemental heading will only be printed when the OA=Y option is specified. It is a line of text that will appear before the first row of the table. The supplemental heading may contain any text and should be enclosed in quotes. When the pounds symbol  is used in the supplemental heading, it will be printed as the number of cases. The SH option is usually used to indicate who is included in the table. The following is an example of a supplemental heading:

 

OPTIONS SH="TOTAL RESPONDENTS = #"

 

Minimum Denominator

Percentages can be misleading if they are based on a small denominator. The MD option may be used to suppress the printing of percentages that are based on a small denominator. The MD option sets the minimum denominator that StatPac will use for calculating percents. For example, if MD=5, StatPac will calculate percentages if the denominator is greater than or equal to 5. If a denominator were less than 5, StatPac would print dashes instead of the percent. Valid values for MD are between 0 and 100. If MD=0, all percentages will be printed.

Print Mean

The mean average is generally not calculated for a frequency because it involves the assumption of interval data. However, there are some situations where you may want to display the mean as part of a frequency analysis. The ME option may be used to request the mean (and standard deviation). When ME=N, no mean will be printed. If ME=Y, the mean will be printed, and if ME=S, both the mean and standard deviation will be printed. When used in conjunction with the OA=Y option, a separate mean will be printed for each variable.

Mean Position

When the ME option is used with the OA=Y option, the means (and standard deviations) can be printed as the first or last column. If MP=F, the means will be printed as the first column, and when MP=L, they will be printed in the last column. When means are printed in the first column (MP=F), and the SV option is used to sort the variables, they will be sorted by the means instead of the percents.


Labeling and Spacing Options

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub when OA=Y.

Truncate Labels

TL

Sets long value labels to be truncated when TL=Y.

Exact Width

EW

When OA=Y and EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches) when OA=Y.

Column Spacing

CS

Sets the spacing (in inches) between the columns of the listing when OA=Y.

Extra Spacing

ES

When ES=Y, a blank line will be printed below the table headings When ES=N, no blank line will be printed.

Blank Line Between Rows

BL

Sets the number of blank lines between rows when OA=Y.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol) will be shown with the value labels.

Open-Ended Response Coding

It is often useful to code open-ended responses into response categories in order to perform a frequency analysis or crosstabs. The FREQUENCIES command with the OE option allows you to examine and code open-ended alpha variables.

Open-ended response coding is requested by performing a frequency analysis on the variables containing the verbatim text and setting the option OE=Y.  Do not use an IF-THEN SELECT command in the same procedure.

 

FREQ Comment

OPTIONS OE=Y

..

When a verbatim response is held in more that one variable, it is not necessary to specify the MR (multiple response) option. All variables listed in the Frequencies command will be considered to be part of the same verbatim comment. The following procedure says to begin an open-ended response coding session on variables one through three. Note that the text for all three variables will be displayed at the same time during the coding session.

 

FREQ V1 - V3

OPTIONS OE=Y

..

Each time you complete a coding session, StatPac will create a new study and data file called STATPAC-VERBATIM. The STATPAC-VERBATIM study contains the coded verbatim data, and the frequency analysis will be performed on this file (i.e., the coded data). Your original study and data file are not affected by the coding process. All of the coded information is stored in the STATPAC-VERBATIM file. In order to use this coded information in future analyses, the STATPAC-VERBATIM files must be merged with your study and data files using the MERGE command.

Occasionally, you may start a coding session, and for one reason or another, not be able to finish. You can quit the current coding session and continue at a future time. To continue with a previously unfinished coding session (i.e., one that you started coding before, but did not finish), simply run the procedure again. StatPac will detect the existence of a partially completed STATPAC-VERBATIM file, and ask if you want to continue with the previous verbatim coding, or delete the existing verbatim coding and begin a new coding session.

 

Click on the Continue Previous Session button to continue with the previous coding, or the Start New Coding Session button to delete the existing STATPAC-VERBATIM files. If your intent is to continue with a previous session, you can bypass this question by changing the OE option from OE=Y to OE=C (continuation).

Verbatim Blaster

StatPac for Windows has the Verbatim Blaster module built in and it will automatically pre-code all open-ended responses.

Verbatim Blaster processes open-ended responses in two steps. The first step is called pre-coding, and the second step is called final coding. You may modify or interact with the coding process at either or both steps.

The pre-coding step will be performed first. Verbatim Blaster will read the file and count all the unique words that occur in the text. It attempts to combine variations of the same word into a single root word. For example, it would attempt to combine singulars and plurals, different tenses, prefixes and suffixes into a single root word. The pre-coding is not perfect, but it will catch nearly all variations on each root word.

The result of the pre-coding will be to present you with a list of root words along with the number and percent of respondents who used each word.  The list will be initially sorted in descending order by frequency of occurrence in the text. Thus, the words at the top of the list appeared most often in the text. If a respondent used the same word more than once, it will only be counted as one occurrence in calculating the frequencies.

The pre-coding screen appears:

      

Step 1 - Examine Words

The most important task in Step 1 (pre-coding) is to familiarize yourself with the basic content of the verbatim comments.

When no words are selected (highlighted) in the word list, the Previous and Next buttons will show the previous and next records.

A more important feature is the ability to examine the context in which specific words are used. First select the word or words you are interested in exploring by clicking on those words in the word list. Then the Previous and Next buttons can be used to find the previous and next comments that used any of the selected words. The selected word(s) will be shown in red to draw your eyes to that portion of the comment.

Clicking on a word that is not already selected will select the word. Clicking on a word that is already selected will deselect the word.

Join Word Variations

During pre-coding, one of the major functions of Verbatim Blaster is to combine all the variations of each root word. It is generally a good idea to review the words and to combine any variations that Verbatim Blaster may have missed. It will usually not miss any, but it's still a good idea to check.

In the Type of Sort window, click on Alphabetic. Then use scroll bar to scroll through the list. It will be easy to spot variations of the same root word that were not combined since they will appear next to each other in the alphabetically sorted list.

If you should find two variations of the same root word, you will want to join them together so Verbatim Blaster treats them as a single unique word. First select the words to be joined by clicking once on each word. The selected words will be highlighted. You may join more than two words at once by clicking on each word. Then click the Join button. The words will be joined and the count and percent will be modified to reflect the new values. The important thing is to join words that are variations of the same root word or words that have the same meaning.

You may join words during pre-coding or final coding. However, joining word variations is easiest during the pre-coding process because of the ability to perform an alphabetical sort.

When you have finished joining words, click on Frequency in the Type of Sort window to re-sort the list in descending order by frequency of occurrence. During the final coding process, it is usually most convenient to have the most common responses appear near the top of the response category list.

Delete and Exclude Words

There are many words that add little meaning to a sentence. Words like of, the, for, by, this, and hundreds of others don't add much to the meaning of a respondent's statement. These words could be excluded from a sentence without substantially distracting from it's meaning.

Modifiers are words that are used to describe the quantity or magnitude of the word or phrase that follows the modifier. They are usually adverbs. Examples are usually, mostly, and greatly. Modifiers do add meaning to a sentence, but they are generally not helpful in determining the major topic of a sentence.

Verbatim Blaster maintains a list of exclusion/modifier words in a file called EXCLUDE.TXT. This file is an ASCII text file and may be edited with any word processor or text editor. It contains an alphabetical listing of words that do not help identify the primary topic of a sentence.

The first thing Verbatim Blaster does when you ask it to analyze a text file is to eliminate the exclusion/modifier words. It doesn't really eliminate the words; it just pretends they're not there.

The exclusion/modifier list distributed with Verbatim Blaster is fairly complete. However, specific applications may require that you add new words to this list. You can use any text editor to modify the EXCLUDE.TXT file. Words added with a text editor will be sorted into alphabetic order the next time that Verbatim Blaster is run.

You can also add exclusion/modifier words to this file during pre-coding.  If you see a word that doesn't add substantial meaning to sentence, add it to the exclusion/modifier list by clicking on the word to select it, and then clicking on the Exclude button. The word(s) will be added to the EXCLUDE.TXT file and will be excluded from all future open-ended coding session.

Deleting words is different that excluding words. Deleting a word with the delete button will delete the word from the current session only. Future coding sessions with other verbatim text would show the word. To delete a word, click on the word to highlight it and click the Delete button.

Set the Minimum Percent to Display

A typical text file might contain 400 to 500 unique root words (even after combining all the variations of the same words and eliminating the exclusion words). Usually, we wouldn't want to look at a list of this size. Instead, we're most often interested in responses that were made by more than one respondent. Verbatim Blaster lets you adjust the size of the word list by specifying a minimum percent. This is the minimum proportion of respondents that used a word. For example, if you set the minimum percent equal to five, Verbatim Blaster will display words that were used by at least five percent of the respondents. Words that were used by less than five percent of the respondents would be hidden from your view.

At any time during pre-coding, you may change the minimum percent. When the minimum percent is set to zero, all words will be displayed. To change the minimum percent, simply type the new minimum percent and press enter.

Step 2 - Select Words for Categories

The purpose of pre-coding is to identify the important words that are mentioned by respondents. "Important" is, of course, a subjective decision. The minimum percent feature will narrow the number of words to a manageable list. Step 2 is to select those words that seem to hit upon the key concepts in respondents' answers. Sometimes these will be easy to identify, and other times they won't. If the survey question was extremely specific, it will probably be easy to identify the key concept words, and if the survey question was quite general, it might be extremely difficult to identify the key concepts.

"Select Words for Categories" refers to identifying the key concept words that will be carried forward to the final coding process. To select a word, or to deselect a word that has already been selected, click on it in the word list.

If you select any words, only those words will be carried forward to the final coding process, and words not selected will be excluded. If you do not select any words, all displayed words will be carried forward to final coding.

Step 3 - Final Coding

The final coding process is where you refine the coding and the response category labels. The words selected in the pre-coding process provide the foundation for the final response categories. These are the labels you give to the key concepts. The initial response categories will be the words that were selected during pre-coding.

The final coding process involves reviewing the actual open-ended responses for each respondent, and using your understanding of the comment(s) to refine the response category labels. A response category label can be changed at any time. To change the text in a response category label, double click on the label.

The text window (respondent's verbatim text) will appear at the left of the screen, and the response categories window will appear on the right. There will be an arrow to the left of all response categories that were mentioned by the respondent, and the key words in the text will be highlighted.

 

Select and Deselect a Response Category

Use the mouse to select and deselect a response category.  When a response category is selected, an arrow will appear to the left of that category. This means that the current respondent made a comment related to that response category. If a response category is not selected, there will be no arrow. Clicking to the left of the response category label (in the small area reserved for the arrows) will select or deselect that response category.

Change Records

A record is the same as a respondent. Thus, when we say changing records, it simply means displaying a different respondent's answer.  There are two ways to change records.

The first way is used to show a specific desired record. Click on the record number shown on the top left of the Context window. After clicking on the record number, change it to the desired record and press enter.

The second way is to use the Previous and Next buttons to change to the previous and next records. When no response category labels are highlighted, the Previous and Next buttons will advance to the previous or next record numbers. When one or more response category labels are highlighted, the previous or next record that has been pre-coded into that category will be displayed. The current record number will be displayed above and to the left side of the context windows.

One method of performing the final coding would involve repeatedly click the Next button to review the coding beginning with the first respondent and going to the last respondent. Verbatim Blaster will skip over respondent's who did not make a comment.

Another method of pre-coding would be to examine the comments for each response category. First, highlight the response category (or categories) you want to search for. Then click the Next button to search for the next record that contains a reference to that response category. Each time you click the Next button, the next record with a reference to that response category will be displayed. When the last record in the text file is reached, the search will be stopped. 

The search feature provides a quick way to gain a better understanding of a particular response category. It lets you scan all comments related to a specific response category. While using the search feature, the search will be limited to the response category currently being searched. This makes it extremely easy to scan the relevant text. Scanning a particular response category will give you a better understanding of the comments coded into that category.

Change a Response Category Label

Sometimes, it might be necessary to change or delete an existing response category, or to create a new category. The response category labels can be changed at any time by simply typing the new text. Double-click on the response category label you want to change and then you will be able to edit the category label.

Delete and Create New Response Categories

To create a new response category, highlight a response category and click the Create Category button. This will insert a blank line in the response categories so you can type the new response category label on that line. If you create a new response category using this method, you will need to go through each record and decide whether or not that record falls into the newly created response category. Verbatim Blaster does not automatically code responses based on the words you type.

To delete a response category, highlight it and click the Delete Category button. The response category will be immediately eliminated. There is no automatic "undelete", so be careful. You might use Delete Category button to eliminate a response category you consider to be unimportant.

Join Two Response Categories

Sometimes you will want to combine response categories that you initially thought were different. You may join response categories (at any time) into a single category. Click the Join button. Then drag and drop one of the categories onto the other category.  The category you dragged will be deleted and all the responses that were initially assigned to that category will be reassigned to the category you drop it on. To drag a category, move the mouse pointer over that category. Press and hold the left mouse button. While still holding the mouse button, move the mouse so the category outline is over the category to be joined and then release the mouse button. Note that once you use the join category feature, there is no way to return to the unjoined version. There is no automatic "unjoin", so be careful.

Create a Net Response Category

Creating a net category is a useful method of aggregating responses. It is similar to joining response categories except that the secondary response category is not removed as a unique entry in the response category list.

The most common use of a net category is to summarize a group of related response categories without affecting the existing categories. For example, suppose you were evaluating respondents preferences for a new food and there were response categories of red, green, and blue. You might want to create a net category called color. You could use the Join button to join the three categories, but you would then be unable to break down the respondents by their individual color choices. Creating a net category is the solution to the problem.

To create a net category, first create a new blank line for the Net category. Highlight a category and click on the Insert Category button to open up a blank line in the response category list. This blank line will become the net category. Next click the Net button to begin net creation. Finally, drag the category you want to net and drop it on the blank line. Additional response categories can now be added to the new net category. Click Net again, and drag another category to the new Net category. Response categories may be included in a net one at a time using this method.

If you make a mistake while creating or adding to a net, click on the Cancel button to cancel the process. If you inadvertently add a wrong variable to a net, delete the net with the Delete Category button and recreate the net.

Change the Order of the Response Categories

It is sometimes desirable to rearrange the order of the response categories. To move a response category, click the Move button. Then select a category, drag it to a new position, and drop it in the new position in the response category labels list

Finish the Coding Process

If you wish to exit the open-ended response coding program before finishing the coding process, make a note of the current record number and click the Stop For Now button. To continue where you left off at a future time, run the same procedure again, and select Continue Previous Session. Then click on the current record number on the top left of the Context window, type the record number where you left off, and press enter.

To finish the coding process, and run the frequency analysis on the coded data, click the Analyze button. The frequency analysis will be performed on the coded data.

After viewing the results of the analysis, StatPac will ask if you want to merge the coded verbatim responses. For example, you may want to use the coded verbatim information in other analyses (e.g. crosstabs of the verbatim responses with other variables). Since StatPac can only analyze one study at a time, you should merge the coded responses (i.e., the STATPAC-VERBATIM file) with your original study and

 

After running the merge procedure, your original file will contain all the original data (including the original verbatim comments) and the new coded verbatim comments. The coded comments (from the STATAC-VERBATIM file) will be added to the end of your original variables, so running the procedure will increase the number of variables in your original study.

Produce a List of Verbatim Comments for Each Response Category

After merging a coded verbatim file, you can merge the STATPAC-VERBATIM procedure file into your existing procedure file. The STATPAC-VERBATIM procedure file is created automatically when you do a merge. It contains a series of procedures to print a listing of the verbatim responses that were coded into each response category. To merge the file, position the cursor at the beginning of the line following two dots. Then select File, Merge, and select STATPAC-VERBATIM as the file to merge.

 

CROSSTABS Command

Crosstabs is one of the easiest ways to look at the relationship between two variables, and one of the most popular ways of examining categorical data.

The syntax for the crosstabs analysis is:

 

CROSSTABS <Variable list> BY <Variable list>

 

For example, let's look at how people's expectations for learning (EXPECTATION) are related to their satisfaction with a lecture (SATISFACTION). The command to request this crosstab analysis is:

 

CR EXPECTATION BY SATISFACTION

(CROSSTABS may be abbreviated CR)

 

The results will be printed in the form of a two-dimensional matrix. The first variable (EXPECTATION) will by printed on the y axis, while the second variable (SATISFACTION) will be printed on the x axis. The keyword BY is a mandatory part of the statement.

If several different crosstabs are desired, request them by specifying a variable list instead of an individual variable. For example, you might be interested in both SATISFACTION with the lecture and the amount of actual LEARNING that occurred. The command to run this analysis would be:

 

CROSSTABS EXPECTATION BY SATISFACTION, LEARNING

 

The matrix size that the crosstabs program can accommodate depends on the available RAM. The variables themselves may be alpha or numeric. StatPac will not print a row or column when total count for that row or column is zero. Missing data (blanks) will be excluded from the analysis unless there is a value label for blank data (e.g., BLANK=Missing data).

Three-way crosstabs may be requested by the following command:

 

CROSSTABS <Var. list> BY <Var. list> BY <Var. list>

 

A three-way crosstab is essentially a series of two-way crosstabs controlled for a third variable. That is, the two-way crosstabs are performed on subsets of the data as defined by the third variable. For example, consider the following crosstabs command:

 

CROSSTABS EXPECTATION BY SATISFACTION BY SEX

 

This command will produce two different crosstab tables, one for males and the other for females. The same results could be obtained by executing the following two procedures:

 

IF SEX="M" THEN SELECT

CROSSTABS EXPECTATION BY SATISFACTION

..

IF SEX="F" THEN SELECT

CROSSTABS EXPECTATION BY SATISFACTION

..

Count/Percent  & Observed/Expected Tables

 

There are two common ways to print crosstabs. One is number, row percent, column  and total percent. The second is observed, expected, observed minus expected, and the cells contribution to the total chi-square. Both of these tables may be printed or excluded using Y or N options. To print both tables, use the following options:

 

OPTIONS CP=Y OE=Y

 

The chi-square is an important statistic; it is used to test whether two variables are independent of each other. In other words, do the observed frequencies in the cells deviate markedly from the frequencies we would expect if the two variables were not related to each other?

A large chi-square statistic indicates that the observed frequencies differ significantly from the expected frequencies. A crosstab with r rows and c columns is said to have (r-1) times (c-1) degrees of freedom.

Using the chi-square distribution and its associated degrees of freedom, you can calculate the probability that the differences between the observed and expected frequencies occurred by chance. Generally, a probability of .05 or less is considered to be a significant difference; this probability is termed "probability of chance" in the output.

When a crosstab contains many cells with counts less than five, the probability of chance for the chi-square statistic can be inaccurate. Therefore, the user should consider grouping some rows and/or columns if many cells have expected values less than five.

The second way of printing crosstabs (observed/expected table) is useful in explaining the significance of the chi-square statistic. The cells with high values in the "contribution to the chi-square" are the ones that "contribute" the most to the significance of the chi-square. This is useful in the discussion of the results of a study as there are often only a few cells which deviate from independency.

 

Example of a Count/Percent Table

 

Print Format

Each cell of the crosstabs table may contain up to four numbers. Their meanings are labeled in the upper left corner of the table. You may choose to print or suppress any of these numbers by using the PF option. The parameters for this option are:

            N   Number or observed frequency

            R   Row percent or expected frequency

            C   Column percent or observed minus expected

            T   Total percent or contribution to chi-square

One or more parameters may be used with the PF option. These should not be separated from each other. For example, if you want to print the number and total percent, use the following option:

 

OPTIONS PF=NT

 

If a table is too large to fit on one page, it will be split to use as many pages as necessary. The actual number of columns that can fit on a page is determined by the pitch and carriage width of your printer.

Category Creation

The actual categories (rows and columns) in the crosstab analysis can be created either from the study design value labels (CC=L) or from the data itself (CC=D). When the categories are created from the labels, the value labels themselves will be used to create the categories for the analysis, and data that does not match up with a value label code will be counted as missing. That is, mispunched data will be counted as missing. When categories are created from the data, all data will be considered valid whether or not there is a value label for it.

Sort Codes

The actual labeling for the x and y axes are taken from the value labels. In most circumstances, the order that you entered the value labels (during the study design) reflects the order in which you want the value labels to be listed. You can override the order of the value labels in the study design by using the option (SC=Y). The value labels will then be displayed in ascending alphabetical or numeric order. This feature is especially useful when the study design itself does not contain any value labels. If this option is not used (i.e., SC=N), the order of the value labels on the printout will reflect the order in which values are encountered in the data file.

Statistics

When the statistics option is specified, several other statistics will be calculated and printed.

 

Example of a Statistics Printout

 

A discussion of each statistic follows:

Phi

The Phi statistic is calculated and printed for two-by-two tables. It may be interpreted as a measure of the strength of the relationship between two variables. When there is no relationship, Phi is zero. When there is a perfect positive relationship, Phi is one. When there is a perfect negative relationship, Phi is minus one.

When comparing one crosstab table to another, Phi is preferable to the chi-square because it corrects for the fact that the chi-square statistic is directly proportional to the number of cases. In other words, Phi could be used to compare two crosstabs with unequal N's.

Cramer's V

If Phi is calculated for tables larger than two-by-two, there is no upper limit to its value. Therefore, the Phi statistic is not printed for tables greater than two-by-two. Instead, Cramer's V is printed. Cramer's V adjusts the Phi for the number of rows and columns so that its maximum value is also one. It may be interpreted exactly like the Phi (e.g., a large Cramer's V indicates a high degree of association between the two variables).

Contingency Coefficient

The contingency coefficient is another measure of association based on the chi-square statistic. It may be calculated for any size of table; however, its maximum value will vary depending on the number of rows or columns. Therefore, the contingency coefficient should only be used to compare tables with the same numbers of rows and columns.

Kendall's Tau Statistics

Kendall's tau statistics are used to measure the correlation between two sets of rankings. It is the number of concordant pairs of observations minus the number of discordant pairs adjusted so it has a range of minus one to plus one. There are three different methods for standardizing tau (tau-a, tau-b and tau-c). Note that tau-b is only calculated for square tables.

Gamma

Gamma is similar to the tau statistics except that it may be interpreted directly as the difference in probability of like rather than unlike orders for the two variables when they are chosen at random. Gamma has a value of plus one when all the data is in the diagonal that runs from the upper-left corner to the lower-right corner of the table. It has a value of minus one when all the data is concentrated in the upper-right to lower-left diagonal.

Cohen's Kappa

Cohen's Kappa is another measure of the degree to which the data falls on the main diagonal. It is only calculated for square tables.

Somers' d

Somers' d is a measure of association for ordered contingency tables when there is a dependent and independent variable. It may be interpreted in the same fashion as a regression coefficient.

Odds ratio

The odds ratio is calculated for two-by-two tables. Its value may vary between zero and infinity. A value greater than one indicates a positive relationship while a value near zero represents a negative relationship. A value of one indicates statistical independence. Note that this is different than most measures of association.

Yule's Q and Yules Y

Yule's Q is a function of the odds ratio. Like the odds ratio, its value will vary between zero and one; unlike the odds ratio, a value of zero indicates statistical independence, while values of minus one and one represent perfect negative and positive relationships. It will be calculated for two-by-two tables.

Entropy

Entropy is a measure of disorder; that is, the extent to which the data is randomly distributed in a contingency table. The greater the disorder, the greater the entropy statistic. It is useful for comparing different crosstab tables with each other. A low entropy (near zero) indicates that the data tends to be clustered in only a few of the possible categories. A high entropy indicates that the data is evenly distributed among all the possible categories.

Yate's Correction

If degrees of freedom equals one (i.e., when the crosstabs produces a two-by-two table), the chi-square statistic can have the Yate's correction applied and be printed as the "Corrected chi-square". The option YA=Y will enable Yate's correction for two-by-two tables, while YA=N will disable it.

Residual Analysis

Residual analysis is one method used for identifying the categories responsible for a significant chi-square statistic. This involves calculating the standardized residual for each cell and adjusting it for its variance. The normal distribution is used to find the probability of the adjusted residual using a two-tailed test of significance. A significant adjusted residual indicates that the cell made a significant contribution to the chi-square statistic.

The residual analysis may be turned on or off with the option RA=Y and RA=N, respectively. A sample printout of a residual analysis would look like this:

 

Example of a Residual Analysis Printout

 

Interaction Analysis

While many of the statistics indicate whether or not two variables are related, Goodman's interaction analysis is a method of finding out if the magnitude of the relationship is caused more by one part of the table than another. Its purpose is to evaluate all possible combinations of two-by-two tables for interaction effects.

The interaction is defined as the natural log of the odds ratio. The purpose of the log function is to take into account the possibility of a curvilinear relationship. The standard error of the interaction is calculated as well as the standardized interaction. The standardized interaction is used to calculate a two-tailed probability using a normal distribution.

The interaction analysis may be requested with the IA=Y option. A sample printout would look like this:

 

Example of an Interaction Analysis Printout

 

Equiweighting

Equiweighting is a technique to eliminate distortions from most measures of association caused by column marginal disparities. You should use Equiweighting whenever there is a dependent/ independent variable relationship (implying causality) and the column totals differ markedly for each of the categories. Note that Equiweighting only applies to the observed/expected table and the statistics that are printed with the table. After Equiweighting, cell frequencies will no longer be integer values. Equiweighting may be requested with the EQ=Y option.

Labeling and Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Key

KY

Sets whether the top left corner of the banner will show a legend of the cell contents.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol) will be shown with the value labels.

Label Justification

LJ

Sets the justification for the banner variable label.

Label Underline

LU

Sets whether the banner variable label will be underlined.

Column Justification

CJ

Sets the justification for the banner value label columns.

Column Underline

CU

Sets whether the banner value label columns will be underlined.

Bottom Justify

BJ

Sets whether the banner labels will be bottom justified.

 

BANNERS Command

Banner crosstabs are often used in marketing research when it is important to display several crosstab tables as part of the same printout. It is similar to the crosstabs program except that multiple variables may be specified for the x and/or y axis. The variables across the top of the page are called the banners and the variables down the side of the page are called the stub. It has an advantage over regular crosstabs in that there is much more control over the appearance of the output. The disadvantage is that not all the statistical measures of association are available with banners.

The syntax for the command to run banners is:

 

BANNERS <Stub variable list> BY <Banner variable list>

 

The keyword BY is a mandatory part of the command syntax. The first variable list (called the stub) will be displayed down the side of the page, and the second variable list (called the banner) will be displayed across the top of the page. The maximum table size is 250 rows and 60 columns. The variables may be alpha or numeric.

To print banners with variables 1 & 2 on the y axis and variables 3 to 7 on the x axis, enter the command:

 

BA V1 V2 BY V3 - V7    (BANNERS may be abbreviated as BA)

 

If a table is too large to fit on one page, it will be split over as many pages as necessary. The actual number of columns that can fit on one page is determined by the pitch and carriage width of your printer. Value labels appearing in the banner heading will be split into multiple lines as necessary to fit in the banner column widths. The actual positions of the word splits can be controlled by inserting a vertical bar into the value labels at the locations where you want the words to split.

Type of Data

Two types of banner tables can be printed. The most common type is the count/percent table. The data for the rows and columns is categorical (nominal or ordinal). Each row and column in the table represents a category. Set TY=C to select the count/percent table. Unlike crosstabs, the banners program defines its row and columns from the value labels in the study design. That is, the program uses the value labels to create the banner rows and columns. Any data value not having a matching value label in the study design will be counted as missing; therefore, set up the study design labels to reflect the headings and labeling for the banners.

The second type of table is when the stub variables are interval or ratio data. There aren't any defined categories for the rows. Instead of counts and percents, we would want to see means and standard deviations in the table. Set TY=P to indicate that the stub variables are parametric. The table will show means and standard deviations instead of counts and percents.

Print Format

Each cell of the banners table may contain up to four numbers. To print or suppress any of these numbers, use the PF option.

The parameters for this option are:

N     Number

R     Row percent

C     Column percent

T     Total percent

One or more parameters may be used with the PF option. These should not be separated from each other. For example, to print the number and total percent, use the following option:

 

OPTIONS PF=NT

 

Example of a Banners Printout

 

Alternate Format

When TY=P (the stub is parametric data), each cell of the banners table may contain up to four numbers. To print or suppress any of these numbers, use the AF option.

The parameters for this option are:

N     Number

M     Mean

D     Standard Deviation

E     Standard Error

One or more parameters may be used with the AF option. These should not be separated from each other. For example, to print the number, mean and standard deviation, use the following option:

 

OPTIONS AF=NMD

Category Creation

In most cases you will want the rows and columns of a banners table to be a reflection of the value labels. When CC=L the categories (what StatPac defines as a row or column) will be created from the value labels in the codebook. That is, a row on the stub or column in the banners will be created for each value label in the codebook. If a variable does not have value labels, it will not be included in the table.

In some cases, however, you might not have previously assigned value labels to a variable but still want the variable to be included in the table. Set CC=D to create the categories from the data itself instead of the value labels. For example, if you had a variable with data values of 1-5 but no value labels, you could include this variable in the banner table by setting CC=D. Alternatively, you could use the LABELS command in the procedure to specify value labels for the variable.

Means & Standard Deviations

The reserved word MEAN may be used to print the mean average of any row or column variable. The mean will be the average of the value codes (not the value labels). The word MEAN should be included in the variable list immediately following any variable that you want to calculate the mean average of. Either row or column means may be specified. For example, the following command has two variables on each axis (V1 and V2 BY V3 and V4). Means will be printed for both the first and second row variables, and following only the first column variable.

 

BANNERS V1 MEAN V2 MEAN BY V3 MEAN V4

 

The standard deviations or standard errors may also be printed below the means by specifying the SD=D or SD=E option, respectively. When SD=E and the FP option is used to specify a finite population size, the standard error will be calculated using the finite population correction factor. (See the OPTIONS Command in the Keywords section of this manual.)

Row & Column Totals

The reserved word TOTAL can be used with the BANNERS command to specify row and/or column totals anywhere in the table. The word TOTAL can be used as a variable name in any position (and as many times as desired) in either (or both) variable lists. When TOTAL is used in the first variable list (for the Y axis), a row is included that displays the totals for all columns in the table. When TOTAL is used in the second variable list (for the X axis), the table includes a column that contains row totals. As an example, a command to print totals in the first row and first column of the banners table would be:

 

BANNERS TOTAL, SAT-SCORE, GPA BY TOTAL, CLASS

 

A row or column total reflects the number of cases throughout the entire data file in which the value for the row or column appears. Therefore, the numbers for one particular pair of intersecting variables may not add up to the number for the row or column total. For example, if a variable which recorded sex (male/female) is placed on the X axis against a variable on the Y axis which recorded make of car owned, and 20 of the 100 women who completed the survey did not answer the car question, the column total for females would be 100, but the sum of the females in all the rows of the car variable would be only 80.

 

              Male   Female   Total

Ford         50         20        100

Other        40        60         100

Total       100      100             

 

It is easy to create a total column that reflects the row totals irrespective of the other cell counts. First use the NEW command to create a new variable called NET. The value of NET will be initialized as missing for all cases. Then use the LABELS command to assign a label to missing data. Since the banners program uses value labels to determine what is a row and what is a column, it is necessary to use the LABELS command, even though the label is set to blank. Finally, specify the new NET variable instead of the TOTAL keyword in the BANNERS command.

 

NEW (N1) "NET" Totals

LABELS NET (=)

BANNERS TOTAL SCORE BY NET CLASS

Sort Stub

The actual labeling for the x and y axes are taken from the study design information in the codebook. In most circumstances, the value labels will reflect the order in which you want the category codes to be listed. It is possible to override the order of the value labels in the study design by using the option (SS=A or SS=D). The category codes (value labels) on the stub will then be displayed in ascending or descending numeric order by frequency.

Additionally, a digit may be added as a suffix to the SS=A or SS=D. It is used to sort the stub excluding the last one or more value labels. This is useful when the last value label is an "other" or "don't know" category, and you want to sort the stub, but still leave the "other "or "don't know" as the last row on the stub. For example SS=D1 would sort the stub in descending order by frequency, except it would leave the last value label as the last row regardless of its frequency.

Compress Output

Compression refers to the way the program creates page breaks. When compression is on (CO=Y), the program will attempt to fit as many columns and rows on a page as possible. That is, page breaks may occur between different value labels of the same variable. When compression is off (CO=N), the program will break pages between each variable on the y-axis. Of course, when there are many categories for a variable, it may be necessary to split up a variable over successive pages regardless of the compression setting. Setting CO=Y or CO=N will apply to both the stub and banner. Compression my be selectively applied to either the stub or banner using CO=S and CO=B, respectively. When compression is set to the stub  (CO=S), page breaks will occur between variables (not between value labels), however, the program will still attempt to maximize the number of variables that can fit on a page.

Percentage Base

The percentage base on a banners analysis can either be the number of respondents (N) or the total number of responses. If PB=N, the denominator for calculating percentages will be the number of respondents. If PB=R, the denominator will be the total number of responses for all individuals.

Special Value Label HIDE

When creating a banner table, it is often desirable to display only some of the response categories. The LABELS command may be used to eliminate undesirable categories from the table. This will eliminate the column from the table and from any calculations of percentages on the table.

For example, assume the following counts for V1:

 

  1                     2                        3

Agree           Neutral             Disagree         No Response      Total N

 30                   20                     40                    10                     100

 

If PB=N, (denominator equals number of respondents), the percents will be:

 

Agree           Neutral             Disagree

 30%                20%                   40%

 

If PB=R, (denominator equals number of responses), the percents will be:

 

Agree                   Neutral                 Disagree

30/90=33%         20/90=22%          40/90=44%

 

We could use the following LABELS command to eliminate the "Neutral" category from the table:

 

LABELS V1 (1=Agree)(3=Disagree)

 

If PB=N, the percents will still be based on a denominator of 100. If however, PB=R, then the percents will be based on a denominator of 70 (30+40):

 

Agree                     Disagree

30/70=43%           40/70=57%

 

The special value label "HIDE" may be used to suppress printing of a value label without reducing the denominator for the percents calculations. The following LABELS command could be used to eliminate the "Neutral" category from the table, while still including the "Neutral" count in the denominator:

 

LABELS V1 (1=Agree)(2=Hide)(3=Disagree)

 

Any row or column that has a value label of "HIDE" will not be printed, but it will be included in the percent calculations when PB=R. Note that the percentages are based on the counts for all value labels (including the "Neutral" category), even though all the value labels are not displayed in the table.

 

Agree                     Disagree

30/90=33%           40/90=44%

 

If you only wanted the "Agree's" to show in the table, you could use the following statements in the procedure. The percentages in the table would still be based on 90:

 

LABELS V1 (1=Agree)(2=Hide)(3=Hide)

OPTIONS PB=R

..

Multiple Response

Multiple response variables can be included in banner crosstabs by using the MR option to combine those variables that should be interpreted as a single variable. The syntax to combine multiple response variables is:

 

OPTIONS MR=(<list 1>)(<list 2>)....(<list n>)

 

Each variable list represents a group of multiple response items that should be grouped as if they were a single variable. Each group must be enclosed in parentheses and specified as a variable list (individual variables are separated by commas or spaces, and ranges are specified with a dash). Either variable names or variable numbers may be specified in the MR option variable list. The V prefix is optional for variable numbers.

The sequence of variables specified in a multiple response group list must match a sequence of the x or y axis banner list. For example, consider the following BANNERS command:

 

BANNERS V5 - V8, V10, V11 BY V1 - V3, V5, V6

 

Multiple response variables    y-axis:  5-8,10,11    x-axis:  1-3,5,6

 

OPTIONS MR=(1-3)          Groups vars. 1, 2 & 3 on the x-axis

OPTIONS MR=(2,3)          Groups vars. 2 & 3 on the x-axis

OPTIONS MR=(8,10-11)   Groups vars. 8, 10 & 11 on the y-axis

OPTIONS MR=(5,6)           Groups vars. 5 & 6 on the x & y-axis

 

The following groupings might cause problems:

 

OPTIONS MR=(1-6)      Groups vars. 1, 2 & 3 on the x-axis,

                                         but not variables 5 or 6 because

                                         variable 4 was not part of the banners

                                         variable list

OPTIONS MR=(6-11)     Groups vars. 6, 7 & 8 on the y-axis,

                                          but not variables 10 or 11 because

                                          variable 9 was not part of the banners

                                          variable list

OPTIONS MR=(10-15)   Groups vars. 10 & 11 on the y-axis and

                                          ignores the extra variables

OPTIONS MR=(3-7)        No variables grouped

OPTIONS MR=(11-20)    No variables grouped

OPTIONS MR=(3,2,1)      No variables grouped

OPTIONS MR=(8,11,10)  No variables grouped

 

In general, the MR option will never cause a fatal error. If an invalid grouping is found, it is simply ignored and the variables will not be grouped on the output. The banners program uses the value labels from the first variable specified in each group list. The MR option should be used only to group variables which share a list of common value labels. The value labels must be specified in the study design (i.e., they will not automatically be determined from the data file). This was implemented to prevent mispunched or spurious data from creating it's own row or column in the output.

Net Codes

Net categories may be created and displayed on the stub using the NT option in conjunction with the MR option. The NT option specifies the codes on the stub variable that are to be interpreted as net categories. Net categories are excluded from the calculations of totals and means. Multiple categories are separated with a slash and enclosed in quotes.  The general format is:

 

OPTIONS NT="code/code/code"

 

For example, suppose you want to create a banner table where the stub (V5) is a five-point Likert scale. The scale is coded: (1=Very good) (2=Good) (3=Fair) (4=Poor) (5=Very poor).  You want the stub to contain two net variables and look like this:

 

1=Very Good

2=Good

NET: Very Good or Good

3=Fair

NET: Poor or Very Poor

4=Poor

5=Very Poor

Mean & SD

 

The first step is to create a NET variable and compute it equal to values not used in the originally coded variable. In this example, 6, 7, 8, 9, and 0 are unused, so we could use any of them for the new NET variable. Then use the LABELS command to relabel the stub categories in the order you want them to appear. Include the new NET variable in the BANNERS command as if it were a multiple response variable. Use the MR option to specify multiple response and use the NT option to specify which codes are the net categories.

 

New (N1) "NET"

If V5="1/2" Then Compute NET=6

If V5="4/5" Then Compute NET=7

Labels  V5 (1=Very Good) (2=Good) (6=NET: Very Good or Good) (3=Fair) (7=NET: Poor or Very Poor) (4=Poor) (5=Very Poor)

Banners V5 NET Mean By Total Age Gender Group

Options MR=(V5 NET) NT="6/7"

..

Weighting

Weighting is useful when the true incidence in the population is known, but data collection yielded a different incidence. In other words, there was a sampling error (the sample does not adequately represent the population). Weighting can be used to mathematically increase or decrease the counts of any banner variables so they more accurately reflect the known population parameters.

 

The WEIGHT command in StatPac will create a weighted file using integer case weights where a probability function is used for the non-integer portion of the weights. The WT option in the BANNERS command will not create a new data file, but rather, simply adjusts the counts in the banner table. The WEIGHT command and the WT option in banners are different methods of accomplishing the same goal and should not be used together in the same procedure.

 

Weighting the Entire Banner Table

Take for example a simple banner table with an automatic total row and a mean row:

 

Title (#)

Banners V1 By Total Gender

Options AT=Y AM=Y PC=Y

..

 

The table might look like this: 

 

Looking at the Total row, we see that our sample had 64.6% males and 35.4% females. However, we know that the population actually has 55% males and 45% females, so the Total column might be producing an inaccurate reflection of the total population due to a sampling error. To correct the problem, we would weight the gender variable so the table reflects the 55% and 45% proportions that we know exist in the population.

 

The first step is to calculate the weights for males and females. The weights are easily calculated by the following formula:

 

Weight = Desired Percentage / Observed Percentage

 

Thus, the weight for males would be 55 divided by 64.6 = .8514 and the weight for females would be 45 divided by 35.4 = 1.2712.

 

Typically, you'll create a variable that contains the weight for each case. Subsequent procedures would specify the WT option to weight the entire banner tables by the case weight variable

 

STUDY SEGMENT

NEW (N7) "CaseWeight"

IF GENDER = 1 THEN COMPUTE CaseWeight = 0.8514

IF GENDER = 2 THEN COMPUTE CaseWeight  = 1.2712

SAVE

..

Banners V1 By Total Age Gender Group

Options WT=(CaseWeight)

..

Banners V11-V20 By Total Age Gender Group

Options WT=(CaseWeight)

..

 

Weighting Individual Banner Variables in the Table

The other form of the WT option lets you weight individual banner variables with their own weights

 

The format for the WT option when you want to weight just one banner variable is:

 

OPTIONS WT=(variable code=weight code=weight)

 

Spaces or commas may be used within the parentheses to separate each of the components of the option. 

 

In this example, the codebook specifies 1=Male and 2=Female so the WT options would use codes of 1 and 2.

 

Title (#)

Banners V1 By Total Gender

Options AT=Y AM=Y PC=Y WT=(Gender 1=.8514 2=1.2712)

..

 

Rerunning the procedure would produce a weighted analysis with an adjusted total row and total column.

 

More than one banner variable may be weighted. The syntax is the same except additional sets of parentheses are added for each variable to be weighted.

 

OPTIONS WT=(variable code=weight code=weight) (variable code=weight code=weight code=weight) 

 

When the WT option is used, the total column will reflect the weighted values of the variable that follows it. If more than one variable is weighted, it would be wise to specify more than one total column. For example, if ethnicity were coded as an alpha variable (W=White and B=Black), the following commands would produce a total column for gender and a total column for ethnicity, and both would be weighted:

 

Banners V1 By Total Gender Total Ethnicity

Options AT=Y AM=Y PC=Y WT=(Gender 1=.8514 2=1.2712)(Ethnicity W=.5672 B=1.8141)

..

 

Fractional Counts

The FC option may be used in conjunction with the WT option to display fractional cell counts. FC=Y will show the decimal portion of the cell counts and FC=N will display them as integers. While weighting does create fractional cell counts, it is often confusing (e.g., how could there be 178.6 males?). Using FC=N will round all cell counts to whole numbers, while FC=Y will show the decimal portions. 

Supplemental Heading

The supplemental heading is a line of text that will appear after the heading and title, but before the banner table. It may contain any text and should be enclosed in quotes. When the pound symbol  is used in the supplemental heading, it will be printed as the number of cases. The SH option is usually used to indicate who is included in the banner table. The following is an example if a supplemental heading:

 

OPTIONS SH="BASE: ALL RESPONDENTS (N=#)"

N Equals

The sample size can be displayed in the top left corner of the table with the NE option. It may contain any text and should be enclosed in quotes. When the pound symbol  is used in the N Equals option, it will be printed as the number of cases. The NE option is usually used to indicate who is included in the banner table. The following is an example of the N Equals option:

 

OPTIONS NE="(N=#)"

Significance Tests

StatPac offers significance testing in banner tables. To bypass all significance testing, set the ST option to none (ST=N). The following options control the type of significance tests:

 

OPTIONS ST=N  (no significance tests)

OPTIONS ST=P  (t-test between percents only)

OPTIONS ST=M  (t-tests between means only)

OPTIONS ST=T  (t-tests between percents and means)

OPTIONS ST=C  (chi-square tests for each subtable)

OPTIONS ST=A  (t-tests between means and percents and chi-square tests)

 

T-Tests Between Proportions and Means

Two-tailed t-tests between column percents and means can be performed with the ST option. When specified, StatPac will automatically set the banner to include a code letter for each column, beginning with column "A". An independent samples t-test will be performed between all combinations of banner columns, and the results will be displayed in the table if they are significant at the alpha levels set by the C1 and C2 options.

Upper case letters indicate "high significance: and lower case letters indicate "moderate significance" (high and moderate being defined by the values of C1 and C2). For example, suppose C1=.05 and C2=.01. After running the analysis, you see a cell with the letters "Ce". This means that the percentage in this cell is significantly different from the percentage in column C at the .01 level, and significantly different from the percentage in column E at the .05 level.

Chi-Square Tests

Banner crosstab tables may be broken down into several combinations of smaller tables, consisting of one variable on each axis. For example, the following BANNERS statement could be broken down into three subtables:

 

BANNERS V1 BY V2, V6, V9

 

The subtables would be V1 by V2, V1 by V6, and V1 by V9. It is then possible to calculate a chi-square statistic for each subtable. Use the option ST=C to request a chi-square analysis for all the combinations of subtables. The chi-square, degrees of freedom and probability of chance will be printed for each subtable.

It is not possible to calculate chi-square statistics for tables with completely missing rows or columns; therefore, if any row or column in a subtable is completely missing, it will not be included in the calculation of the chi-square statistic or degrees of freedom (even though it may be displayed in the count/percent table).

 

Example of a Two-Way Chi-Square Statistics Printout

 

When ST=A, all types of significance testing will be performed. The output will include the t-tests between percents and means and two-way chi-square tests.

Yate's Correction

If degrees of freedom equals one (i.e., when the banners program produces a two-by-two table), the chi-square statistic can have the Yate's correction applied. The option YA=Y will enable Yate's correction for two-by-two tables, while YA=N will disable it.

Zero Rows & Columns

You may choose whether or not to print zero rows and columns. This situation (of zero rows or columns) could occur if there are value labels in the study design for which there is no data. If you want the reader of your report to know that a category exists, you will probably want to print rows and columns with zero counts (ZR=Y ZC=Y). In most cases, however, conserving space is more important, so you would set ZR=N and ZC=N.

Automatic Page Title Creation

When performing a series of banner analyses, each having the same banner columns, and only one y axis variable (per page), it may be desirable to make the page title the same as the y axis variable label. When the title is set to a pounds symbol in parentheses, the title will become the variable label for the y axis variable. (This can be changed to the x axis label using a patch).

For example, let's say you had several demographic variables as your banner points, and you wanted to look at several other variables on the y axis (down the stub). You want a series of tables that look like this:

 

Special Study            (Page Heading)

 

The variable label of the "Some Variable" on the y axis       (Title)

 

                               Age                 Sex                       Income

                             Under 21     Male  Female     Low   Middle   High

Some                      -------           -----  -----           -----    -----     -----

Variable                 -------           -----  -----           -----    -----     -----

 

The following procedure would produce five similar tables, each on a different page, and each with a different title:

 

STUDY Yourstudy

HEADING Special Study

TITLE (#)

BANNERS V1-V5 BY AGE SEX INCOME

OPTIONS CO=N SH=""

..

Total Row Position

When the TOTAL keyword is imbedded between other variables in the banners command line, the TP option is used to determine whether the total should be printed for the previous variable or the next variable. In the following example, the total row could be the last stub for V1 or it could be the first stub for V2, depending on the setting of the TP option.

 

BANNERS V1 TOTAL V2 BY AGE SEX INCOME

 

If TP=L, the last row for V1 will be a total row. If TP=F, the first row for V2 will be a total row.

Total Counts

The TC option makes it possible to print only the counts (without the percents) in total rows and total columns, even when percentages are being printed in the rest of the table. If TC=Y, total rows and total columns will only contain the counts. If TC=N, total rows and columns will be defined by the PF option, and will contain the same number of values as the other cells in the table.

Total Adjustment

The TA option may be used to set how total columns are calculated. If TA=N, the total columns will be based on the number of non missing cases for the stub variable. If TA=Y, the total column counts will be the sum of the counts for the variable that follows it. If there are no missing data for the banner variable, the counts will be the same, but if the banner variable that follows the TOTAL keyword has missing data, the counts in the total column will be different. Thus, when setting TA=Y,  you could insert the work TOTAL before each banner variable and each total column might contain different counts.

Total Row Denominator

Normally, a total row will be based on the same denominator at specified by the PB option (either N the number of cases, or R the number of respondents). If PB is set to R, you can force the percentages in a total row to be calculated using N as the denominator by setting TD=Y. This is sometimes handy when the banner contains multiple response variables.

Total Total Intersections

When printing a table that contains both total rows and total columns, there will be at least one intersection of a total row and a total column. You must set the precedence as to how the intersection cell is calculated. It may be based on the sum of the row counts (TT=R) or the sum of the column counts (TT=C).

Automatic Total Row

The AT option may be used to automatically print a total row for each variable on the stub. Its purpose is to eliminate the necessity of having to type the TOTAL keyword for each of the stub variables. In the previous example, if you wanted each stub variable to begin with a total row, the command would be:

 

BANNERS TOTAL V1 TOTAL V2 TOTAL V3 TOTAL V4 TOTAL V5 BY AGE SEX INCOME

OPTIONS CO=N

..

If the AT option is set to "Y", the total rows will be included in the output, even when the TOTAL keyword is not included in the stub variable list. The following procedure would produce the same output as the previous procedure:

 

BANNERS V1-V5 BY AGE SEX INCOME

OPTIONS CO=N AT=Y TR=F

..

The TR option is used in conjunction with the AT option to determine whether the total row will be the first or last row on the stub. Note that if you use the option AT=Y, then you should not use the TOTAL keyword anywhere in the stub variable list.

Automatic Mean Row

The AM option may be used to automatically print a row of means for each variable on the stub. Its purpose is to eliminate the necessity of having to type the MEAN keyword for each of the stub variables. In the previous example, if you wanted each stub variable to include a row of means, the command would be:

 

BANNERS V1 MEAN V2 MEAN V3 MEAN V4 MEAN V5 MEAN BY AGE SEX INCOME

OPTIONS CO=N

..

If the AM option is set to "Y", a row of means will be included in the output, even when the MEAN keyword is not included in the stub variable list. The following procedure would produce the same output as the previous procedure:

 

BANNERS V1-V5 BY AGE SEX INCOME

OPTIONS CO=N AM=Y

..

Labeling and Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Variable Spacing

VS

Sets the spacing (in inches) between the banner variables.

Key

KY

Sets whether the top left corner of the banner will show a legend of the cell contents.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol) will be shown with the value labels.

Underline Stub Variable Labels

UL

Sets whether the stub variable labels will be underlined.

Bottom Justify

BJ

Sets whether the banner labels will be bottom justified.

Heading Justification

HJ

Sets the justification for the banner variable labels

Bottom Justify Heading

BH

Sets whether the banner variable labels will be bottom justified.

Value Label Justification

LJ

Sets the justification for the banner value labels.

Bottom Justify Value Labels

BL

Sets whether the banner labels will be bottom justified.

Extra Spacing

ES

When ES=Y, a blank line will be printed above and below the banner value labels. When ES=N, no blank lines will be printed.

Code Justification

CJ

Sets the justification for the value label codes.

Print Stub Variable Label

SL

Sets whether the stub variable labels are shown.

Stub Variable Spacing

VY

Sets the number of blank rows between variables on the stub.

Stub Label Spacing

LY

Sets the number of blank rows between value labels on the stub.

Minimum Cell Count

Researchers often choose not to show the percentages for cells containing a small N. The MC option may be used to suppress the printing of percentages of cells with low counts. For example, if MC=5, StatPac will print cell counts that are greater than or equal to 5. If a cell has a count of less than 5, StatPac would print dashes instead of the percent. If MC=1, StatPac will print dashes for cells where the count is zero. Valid values for MC are between 0 and 100. If MC=0, all cell counts will be printed.

Minimum Denominator

Percentages can be misleading if they are based on a small denominator. The MD option may be used to suppress the printing of percentages that are based on a small denominator. The MD option sets the minimum denominator that StatPac will use for calculating percents. For example, if MD=5, StatPac will calculate percentages if the denominator is greater than or equal to 5. If a denominator were less than 5, StatPac would print dashes instead of the percent. Valid values for MD are between 0 and 100. If MD=0, all percentages will be printed.

 

DESCRIPTIVE Command

Descriptive statistics are usually the first step in the analysis of interval or ratio data. They reveal central tendency and the shape of the distribution.

The syntax of the command to run descriptive statistics is:

 

DESCRIPTIVE  <Variable list>

 

For example, if you are examining college entrance exam scores for READING (V7), ARITHMETIC (V12) and VERBAL (V19) skills, descriptive statistics could be requested with any of the following commands:

 

DESCRIPTIVE READING, ARITHMETIC, VERBAL

DESCRIPTIVE V7, V12, V19

DE V7 V12 V19        (DESCRIPTIVE may be abbreviated as DE)

 

There are a wide variety of descriptive statistics available. To print or exclude individual statistics, use the appropriate option.

Missing data (blanks) will always be excluded from the calculation of descriptive statistics. It will be reported as the number of missing cases but will not be used for any calculations.

One Analysis

The one-analysis option allows you to print selected descriptive statistics for several variables on one page. This option is especially useful for summary reporting, when you only need a few descriptive statistics for a large number of variables.

All the variables specified with the OA option must be numeric. For example, suppose that variables 25-34 are ten numeric scores. The following commands would produce a one-page summary of selected descriptive statistics for each of the ten items:

 

DESCRIPTIVE V25-V34

OPTIONS OA=Y

 

Example of a Descriptive Statistics One-Analysis Printout

 

Statistics

When the OA option is specified, you may select which descriptive statistics you want with the ST option. The codes for the ST option are the same as the specific statistic codes described below. (The only exception is NC, which stands for the number of valid cases). For example, the following commands would report the mean, median, unbiased standard deviation and number of cases for variables 25-34:

 

DESCRIPTIVE V25-V34

OPTIONS OA=Y ST=(ME MD US NC)

 

Note that the commands are identical to the previous example except the ST option is used to identify the specific statistics you want calculated. The parentheses around the list of statistics is mandatory.

Sort Variables

When performing descriptive statistics with the one-analysis option (OA=Y), you can sort the variables by the contents of the first column of the results. The SV (sort variables) option may be set to "N" for no sort, "A" to sort in ascending order, or "D" to sort in descending order. When no sort is specified, the variables will be listed in the order that they appear in the analysis command variable list. The SV option is applicable only when the OA=Y option is specified.

Additionally, a digit may be added as a suffix to the SV=A or SV=D. It is used to sort the variables excluding the last one or more variables. This is useful when the last variable is an "other" variable, and you want to sort the variables, but still leave the "other " as the last variable. For example SV=D1 would sort the variables in descending order, except it would leave the last variable as the last row regardless of its value.

Minimum, Maximum, Range, & Sum

There are four very simple measures of dispersion that give an overall picture of the data. These are the minimum data value, maximum data value, range (maximum minus the minimum), and sum of the data. An option line that would enable all of these features is:

 

OPTIONS MI=Y MA=Y RA=Y SU=Y

Mean, Median, & Mode

The best known descriptive statistics are the mean, median and mode. They describe the central tendency of a distribution. The mean (average) is the most popular. It is found by adding the values for all the (non-missing) cases and dividing by the number of (non-missing) cases. For example, to find the mean age of all your friends, add all their ages together and divide by the number of friends. The mean average can present a distorted picture of central tendency if the sample is skewed in any way.

For example, let's say five people take a test. Their scores are 10, 12, 14, 18, and 94. (The last person is a genius.)  The mean would be the sums of the scores 10+12+14+18+94 divided by 5. In this example, a mean of 29.6 is not a "good" measure of how well people did on the test in general. When analyzing data, be careful of using only the mean average when the sample has a few very high or very low scores. These scores tend to skew the shape of the distribution and will distort the mean.

The median provides a measure of central tendency such that half the sample will be above it and half the sample will be below it. For skewed distributions this is a better measure of central tendency. In the previous example, 14 would be the median for the sample of five people.

The mode is the most common score or category - the one which occurred most frequently. It is possible to have more than one mode if there is not a single "most frequent score". For example, the following set of data has two modes:  12 and 16.

 

         12  12  12  13  14  15  15  16  16  16  17  18

 

The distribution of many variables follows that of a bell-shaped curve. This is called a "normal distribution". One must assume that data is approximately normally distributed for many statistical analyses to be valid. When a distribution is normal, the mean, median and mode in the population will all be equal. If they are not equal, the distribution is distorted in some way.

Skewness, Kurtosis, & Kolmogorov-Smirnov

There are basically two ways that a distribution can be distorted: skewness and kurtosis. Skewness refers to "top heavy" or "bottom heavy"; (i.e., the tail of the curve). If the longest tail of the curve goes to the right (the curve is top heavy), it is positively skewed. If it is bottom heavy (the longest tail of the curve goes to the left), it is negatively skewed. A value of zero for skewness represents a symmetrical distribution, such as the normal distribution mentioned above.

Kurtosis refers to how peaked or flat the curve is. A very flat curve is called "platykurtic" and has a kurtosis of less than three. A very peaked curve is called "leptokurtic" and has a kurtosis greater than three. A value of three for kurtosis indicates normal peakedness and the distribution is termed "mesokurtic".

The Kolmogorov-Smirnov statistic provides a quick check to determine the degree of normality in the data. The value provides a relative indication of normality; as the value moves further away from zero, we can be more certain that the data does not approximate a normal distribution. The distribution is non-normal:

 

              at the .15  level if KS > .775

              at the .10  level if KS > .819

              at the .05  level if KS > .895

              at the .025 level if KS > .955

              at the .01  level if KS > 1.035

Standard Deviation & Variance

The standard deviation is a very useful statistic that measures the dispersion of scores around the mean. On the average, 68 percent of all the scores in a sample will be within plus or minus one standard deviation of the mean and 95 percent of all scores will be within two standard deviations of the mean.

The variance is calculated directly from the distribution of raw scores. It is the sum of the squared deviations of each score from the arithmetic mean divided by N. The standard deviation is simply the square root of the variance. The unbiased estimates should be used when sampling from the population. The formula for the unbiased estimates of the variance and standard deviation is the same except that N-1 is used in the denominator.

Standard Error & Confidence Intervals

Confidence intervals are very important. They allow us to predict where the mean would fall if another sample is taken. The standard error of the mean is used to estimate the range within which we would expect the mean to fall.

Let's say the 95 percent confidence interval for the mean is 12.4 to 22.8. In repeated samples of the same size, the mean would be expected to fall between these two values 95 percent of the time. A similar interpretation can be made for the 99 percent confidence interval. The 95 and 99 percent confidence intervals may be requested using the C5 and C9 options respectively:

 

OPTIONS C5=Y C9=Y

 

The above formula for the standard error of the mean is used when the sample size is small relative to the population size (say, less than ten percent). When the sample size represents a substantial proportion (greater than ten percent) of the population, the standard error is modified by the finite population correction factor  This has the effect of reducing the standard error and narrowing the confidence interval band. When the FP option is used to specify a population size, the standard error will be adjusted and printed as the "Corrected Standard Error Of The Mean". (See the OPTIONS command in the Keywords section of this manual for information on using the FP option.)

Confidence intervals are accurate only if the distribution of the data resembles a normal curve. Be careful; using confidence intervals from non-normal data is risky business.

 

Example of a Descriptive Statistics Printout

 

Quartiles & General "-iles"

Quartiles are often used in education to divide a distribution into 4 groups of equal N. A quartile printout will contain three values (one less than the number of groups). If, for example, the value for the first (lowest) quartile is 50, it means that 25% of the sample had a score of 50 or less. You can specify any division with the IL option.

For example, if you specify IL=10, then deciles will be printed. If the ninth decile (highest) value is 85, it means that 90% of the distribution had a score of 85 or less, and 10% scored equal to or higher than 85. The "-ile" values are interpolated when necessary. Set IL=1 to disable the "any iles" option

 

Example of a Quartile Printout

 

Labeling and Spacing Options
 

Option

Code

Function

Labeling

LB

Sets the labeling to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Label Justification

LJ

Sets the justification for the banner variable label.

Extra Spacing

ES

When ES=Y, a blank line will be printed between each variable on the stub. When ES=N, no blank lines will be printed.

 

BREAKDOWN Command

The breakdown program gives descriptive statistics for one or more criterion variables broken down by one or more subgroup variables. In other words, the breakdown program provides a way of summarizing descriptive statistics for many subgroups. The same information could be obtained by performing multiple descriptive statistics analyses using the IF-THEN-SELECT command to limit each analysis to the desired subgroup.

The syntax for the command to evoke the breakdown program is:

 

BREAKDOWN <Criterion var. list> BY <Subgroup var. list>

 

For example, let's say you want descriptive statistics for AGE; however, you want these statistics broken down by RACE, SEX and INCOME level. In other words, you are interested in comparing age for each of the subgroups (e.g., average age of males versus average age of females).

A data file for this analysis would look like this:

 

AM136      (record 1 - race is coded as A

                   sex is coded as M

                   income level is coded as 1

                   age is 36)

BF342        (record 2 - race is coded as B

                   sex is coded as F

                   income level is coded as 3

                   age is 42)

 

The criterion variable is AGE (V4). It is this variable that you will be calculating descriptive statistics for, so it must be interval or ratio-type data.

Up to ten subgroup variables may be included in the subgroup variable list. These variables may be either alpha or numeric. In our example, these would be: RACE (V1), SEX (V2) and INCOME (V3). Each of the subgroup variables may contain up to 100 categories (value labels).

Any of the following commands would perform the analysis:

 

BREAKDOWN AGE BY RACE, SEX, INCOME

BREAKDOWN AGE BY RACE - INCOME

BREAKDOWN AGE BY RACE SEX INCOME

BR V4 BY V1 - V3    (BREAKDOWN may be abbreviated as BR)

 

Notice that the keyword BY is mandatory. This is necessary because you may want a breakdown on several criterion variables. That is, several different variables may be broken down by the same subgroup variables.

When a criterion variable list is specified, it is equivalent to performing a different breakdown for each criterion variable. For example, both AGE and IQSCORE could be broken down by RACE, SEX and INCOME:

 

BREAKDOWN AGE IQSCORE BY RACE SEX INCOME

 

When a criterion variable list (AGE and IQSCORE) is specified, it is the same as requesting a separate analysis for each variable in the list. In this example, two tasks will be performed. They are:

 

BREAKDOWN AGE BY RACE SEX INCOME

BREAKDOWN IQSCORE BY RACE SEX INCOME

 

When specifying a criterion variable list, care must be taken to insure that each variable in the criterion variable list is different from those in the subgroup variable list. That is, a variable cannot be broken down by itself.

The output from the breakdown program will print the mean, standard deviation, number of cases, and percent for each of the subgroups.

 

Example of a Breakdown Printout

 

Sort Type & Sort Order

The output from the breakdown analyses may be more meaningful when the subgroup categories are displayed in sorted order. If no sort is selected (ST=N), the subgroup categories will be displayed in the order they appear in the study design. If the study design does not contain all the values in the data file (such as mispunched data), the unlabeled values will appear on the printout in the order that they are encountered in the data file.

You can sort the subgroup categories by frequency of response using the option ST=F, or by the category codes themselves (ST=C). For example, the following option would sort the categories by frequency of response in descending order. It would be requested with the following options:

 

OPTIONS ST=F SO=D     (Sort Type by frequency of response)

                                             (Sort Order is descending)

 

In most cases, you'll probably want to have the breakdown printout appear in ascending order by the code. The options statement to do this is:

 

OPTIONS ST=C SO=A      (Sort Type is by category code)

                                               (Sort Order is ascending)

 

Notice that this type of sort is generally the way the information would be listed in the study design. If this is the case, sorting by category code will have no effect. Sorting by category codes is useful if you did not enter value labels for the subgroup variable.

Print Missing

When a subgroup variable is missing, it may be included or excluded from the analysis with the PM option. When PM=Y, all subgroup variables that are missing will be grouped into a unique category and descriptive statistics for the criterion variable will be reported for the "missing category".

Category Creation

Sometimes there may be a subgroup category listed in the study design that has no accompanying data. For instance, nobody in the sample may be over 60 years old. Whether or not you want the label to appear with a count of zero is a matter of preference.

The actual categories (value labels) in the breakdown analysis can be created either from the study design value labels (CC=L) or from the data itself (CC=D). When the categories are created from the labels, the value labels themselves will be used to create the categories for the analysis, and data that does not match up with a value label code will be counted as missing. When categories are created from the data, all data will be considered valid whether or not there is a value label for it.

Percentage Base

In addition to means and standard deviations, the breakdown analysis also prints counts and percents for each of the categories. The denominator for the percentages can either be the number of respondents or the total number of responses. If PB=N, the denominator for calculating percentages will be the number of respondents (i.e., the number of records in the data file). If PB=R, the denominator will be the total number of responses for all individuals.

Labeling & Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub when OA=Y.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns when OA=Y.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Percent Symbol

PP

Sets whether the percent symbol is shown.

Print Codes

PC

Sets whether the codes are printed with the value labels.

Label Justification

LJ

Sets the justification for the banner variable label when OA=Y.

Extra Spacing

ES

When ES=Y, a blank line will be printed after each variable name or label on the stub. When ES=N, no blank lines will be printed.

 

TTEST Command

The t-test is a relatively simple statistic to test the difference between two means even when the sample sizes are small (less than 30). The two variables must be interval or ratio-type data. StatPac lets you test the difference if the N's are equal or unequal. The primary advantage of the t statistic is that it allows us to test the difference between samples with small numbers of cases.

The t distribution depends on the size of the samples. With small samples, the t distribution is leptokurtic; however, as the sample size exceeds 30, the t distribution approaches that of the normal curve. The standard error of the difference is used to establish a range where the difference between the true means of the two populations would be expected to fall.

The significance of the t statistic depends upon the hypothesis the researcher plans to test. This hypothesis should be developed before collecting the data. If interested in determining whether there is a significant difference between two means, but you do not know which of the means is greater, use the two-tailed test. If interested in testing the specific hypothesis that one mean is greater than the other, use the one-tailed test.

There are two basic kinds of t-tests; one for matched pairs and the other for independent groups.

T-Test For Matched Pairs

If each subject or unit being tested was measured in both groups, then the appropriate t-test is for matched pairs. To perform this type of analysis, you must enter the data so that both observations for a subject are in the same data record. An example of an appropriate use of the t-test for matched pairs might be to compare pretest and posttest scores where each person took both a PRETEST (V1) and a POSTTEST (V2). Both values are contained in each data record. An example of a data file for this analysis would look like this:

 

8592       (record 1 - Pretest = 85 & Posttest = 92)

7689       (record 2 - Pretest = 76 & Posttest = 89)

5276       (record 3 - Pretest = 52 & Posttest = 76)

 

The syntax of the command to perform one or more matched pairs t-tests is:

 

TTEST <Variable list> WITH <Variable list>

 

The keyword WITH is mandatory if a variable list is specified (i.e., more than one t-test is requested). If only one t-test is being requested, the keyword WITH may be omitted. In our pretest-posttest example, the commands would be:

 

TTEST PRETEST WITH POSTTEST

TTEST PRETEST POSTTEST

TT V1 WITH V2        (TTEST may be abbreviated as TT)

 

In a matched pairs t-test, it does not matter which variable is listed first in the command. Identical results would be obtained with the command:

 

TTEST POSTTEST WITH PRETEST

 

When a variable list is specified as part of the TTEST command, more than one t-test analysis will be performed. For example, the following command will perform four t-test analyses (V1 with V9, V1 with V23, V7 with V9, and V7 with V23):

 

TTEST V1 V7 WITH V9 V23

 

Example of a t Test for Matched Pairs Printout

 

T-Test For Independent Groups

The other kind of t-test is for independent groups and it is used for noncorrelated data. If each case in the data file is to be assigned to one group or the other based on another variable, use the t-test for independent groups. For example, to compare reading scores between males and females, split the reading scores into two groups depending upon whether the person is male or female. (Each record in the data file is assigned to one group or the other.)

 

M83      (record 1 - male with score of 83)

F91       (record 2 - female with score of 91)

F84       (record 3 - female with score of 84)

 

The syntax for the command to perform an independent groups t-test is:

 

TTEST <Var. list> WITH <Grouping var. list>=(<Code 1>)(<Code 2>)

 

As in the matched pairs t-test, the keyword WITH is only mandatory if a variable list is specified for the analyzed variable or the grouping variable. If only one t-test is requested, its use is optional.

In the above example, SCORE is the variable under analysis and SEX is the variable used to assign records to one group or the other. The commands to perform this t-test are:

 

TTEST SCORE WITH SEX=(M)(F)

TTEST SCORE WITH SEX=(M/m)(F/f)

 

In the second example, notice that both upper and lower case codes are specified; they are separated from each other by a slash. This is done just in case our data entry operators were not consistent in the way they entered the data. That is, sometimes a male was designated with an "M", and other times with an "m". If you are certain that upper case was always used, you could use the first command.

When a variable list is specified, several t-tests will be performed. For example, the following statement would request three different t-tests between males and females (one for each type of score):

 

TTEST SCORE1 SCORE2 SCORE3 WITH SEX=(M)(F)

 

There is no limit on the number of codes that can be specified as part of a group. For example, let's say an INCOME variable is coded into five income groups:

 

What is your annual income?

1=Under  $10,000

2=$10,000 - $20,000

3=$21,000 - $30,000

4=$31,000 - $40,000

5=Over   $40,000

 

To compare scores for those that make up to $30,000 with those that make over $30,000 per year, the command would be:

 

TTEST SCORE WITH INCOME=(1/2/3)(4/5)

 

When performing a t-test for independent groups, the program will accept a wide variety of user styles and formats. Two basic formats are possible. These are:

 

        (<Code>/<Code>)  or  (<Code>-<Code>)

 

All of the following would be valid requests when entering the code(s) or value(s) to split the data into two groups. Notice that the reserved words LO and HI are valid when specifying a range of codes or values.

 

(A/B/D)       (Place all cases with codes A, B or D in this group)

(A-D)           (Place all cases with codes A to D in this group)

(LO-D)         (Place all cases with codes up to D in this group)

(6-9)             (Place all cases with codes 6-9 in this group)

(LO-21)        (Place all cases with up to a 21 in this group)

(22-HI)         (Place all cases with a 22 or higher in this group)

 

Example of a t-Test for Independent Groups Printout

 

Non-Parametric Statistics

The non-parametric equivalents of the t-test can be requested with the NP=Y option. Either the Wilcoxon test or the Mann-Whitney U test will be printed depending on whether you are performing a matched pairs or independent groups t-test.

The Wilcoxon test for correlated samples is the non-parametric equivalent of the matched pairs t-test. The data is assigned rank values and the differences between the ranks are computed. The Wilcoxon test statistic is the minimum of positive and negative differences in ranks. If the number of cases is greater than or equal to ten, the probability is calculated from the normal distribution. When there are fewer than ten cases, refer to the appendix to determine the probability.

 

Example of Wilcoxon Statistic Printout

 

The Mann-Whitney U test is the non-parametric equivalent of the t-test for independent groups. It may be used to evaluate the difference between two population distributions. The data is first ranked. The Mann-Whitney U is the number of times that one group is smaller than the other.

For sample sizes of less than twenty, refer to the appendix to find the probability of U. If the sample size is twenty or more, the distribution approximates the normal distribution, and the normal deviate will be used to calculate the probability. The Mann-Whitney U may be selected by using MW=Y or suppressed by using MW=N.

 

Example of Mann-Whitney U Statistic Printout

Labeling and Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly what is specified with the LW option. When EW=N, the width of the stub will self-adjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Percent Symbol

PP

Sets whether the percent symbol is shown.

Label Justification

LJ

Sets the justification for the banner labels.

Blank Lines Between Rows

BL

Sets the number of blank lines between each variable on the stub when OA=Y.

 

 

CORRELATE Command

Correlation is a measure of association between two variables. A correlation coefficient can be calculated for ordinal, interval or ratio-type data. This program can print descriptive statistics and a correlation matrix for up to 88 variables.

The syntax of the command to run a correlation analysis is:

 

CORRELATE <Variable list>

 

For example, to run a simple correlation of EDUCATION and INCOME, you could type the command as:

 

CORRELATE  EDUCATION  INCOME

CORRELATE  V1, V2

CO  V1  V2          (CORRELATE can be abbreviates as CO)

 

The correlation program can also correlate more than two variables. For example, to print a correlation matrix of AGE, INCOME, ASSETS and TEST-SCORE, you would type the command:

 

CORRELATE AGE, INCOME, ASSETS, TEST-SCORE

 

The output would contain a correlation matrix of all possible combinations of variable pairs. Several statistics can be printed for each pair of variables. These are the correlation coefficient, number of valid records, standard error of the estimate, t statistic and probability of t.

 

Type of Correlation Coefficient

StatPac can calculate two different kinds of correlation coefficients:  Spearman's rank-difference correlation coefficient and Pearson's product-moment correlation coefficient. When calculating a correlation coefficient for ordinal data, choose Spearman's rank-difference technique. For interval or ratio-type data, select Pearson's product-moment formula.

It is your responsibility to select the appropriate type of statistic. This can be accomplished by using the TY option. The TY option may be specified as S (Spearman's) or P (Pearson's). For example, when analyzing interval-type variables, type:

 

OPTIONS TY=P

 

Descriptive Statistics

Descriptive statistics can be selected or rejected with the option DS=Y or DS=N. If Pearson's product-moment correlation is selected, the output will include the number of records, mean and standard deviation. Only the number of records will be printed if Spearman's rank-difference correlation is selected.

 

Example of a Descriptive Statistics Printout

 

Simple Correlation Matrix

The correlation matrix may be printed or suppressed with the SC=Y or SC=N option respectively. Most of the time, you'll probably want to print the correlation matrix. However, there may be times when you only want descriptive statistics and/or Cronbach's alpha reliability statistic.

Correlation Coefficient

The correlation coefficient(s) can be printed with the CC option. The option CC=Y will print the correlation coefficient while CC=N will suppress it.

The value of a correlation coefficient can vary from minus one to plus one. A minus one indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation. A correlation of zero means there is no relationship between the two variables.

Number Of Cases

The number of cases (records) used to calculate the correlation coefficient can be printed with NC=Y. This may or may not be the same as the number of records in the data file. If either the X or Y value is missing from a pair of data, the record will be skipped and not included in the analysis.

Standard Error

The standard error of the estimate for a correlation coefficient measures the standard deviation of the data points as they are distributed around the regression line. The standard error of the estimate can be used to specify the limits and confidence interval for a correlation coefficient. It can only be calculated for interval or ratio-type data. The standard error can be printed using the option SE=Y.

T Statistic

The significance of the correlation coefficient is determined from the student's t statistic. The formula to calculate the t statistic depends upon which type of correlation coefficient is specified. The t statistic can be printed or not by using the option TT=Y or TT=N, respectively. Although StatPac does not calculate the F statistic, it is simply the square of the t statistic.

Probability Of Chance

The probability of the t statistic indicates whether the observed correlation coefficient occurred by chance if the true correlation is zero. It can be printed with the option PR=Y. StatPac uses a two-tailed test to derive this probability from the t distribution. Probabilities of .05 or less are generally considered significant, implying that there is a relationship between the two variables.

When the t statistic is calculated for Spearman's rank-difference correlation coefficient, there must be at least 30 cases before the t distribution can be used to determine the probability. If there are fewer than 30 cases, use the table in the appendix to find the probability of the correlation coefficient.

 

Example of a Correlation Matrix Printout

 

Cronbach's Alpha

Cronbach's alpha is a measure of the internal consistency of a group of items. It provides a unique estimate of reliability for a given test administration. The value of Cronbach's alpha may vary between zero and one. In general, it is a lower bound to the reliability of a scale of items. In other words, Cronbach's alpha tends to be a very conservative measure of reliability.

As well as being a measure of the reliability of a scale of items, Cronbach's alpha may also be interpreted as an estimate of the correlation of one test with an alternative form containing the same number of items.

Labeling and Spacing Options

 

Option

Code

Function

Labeling

LB

Sets the labeling for descriptive statistics to print the variable label (LB=E), the variable name (LB=N), or the variable number (LB=C).

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.