StatPac for Windows User's Guide
StatPac Home
 

Overview

System Requirements and Installation

System Requirements

Installation

Unregistering & Removing the Software from a PC

Network Operation

Updating to a More Recent Version

Backing-Up a Study

Processing Time

Server Demands and Security

Technical Support

Notice of Liability

Paper & Pencil and CATI Survey Process

Internet Survey Process

Basic File Types

Codebooks (.cod)

Data Manager Forms (.frm)

Data Files (.dat)

Internet Response Files (.asc or .txt)

Email Address Lists (.lst or .txt)

Email Logs (.log)

Rich Text Files (.rtf)

HTML Files (.htm)

Perl Script (.pl)

Password Files (.text)

Exported Data Files (.txt and .csv and .mdb)

Email Body Files (.txt or .htm)

Sample File Naming Scheme for a Survey

Customizing the Package

Problem Recognition and Definition

Creating the Research Design

Methods of Research

Sampling

Data Collection

Reporting the Results

Validity

Reliability

Systematic and Random Error

Formulating Hypotheses from Research Questions

Type I and Type II Errors

Types of Data

Significance

One-Tailed and Two-Tailed Tests

Procedure for Significance Testing

Bonferroni's Theorem

Central Tendency

Variability

Standard Error of the Mean

Inferences with Small Sample Sizes

Degrees of Freedom

Components of a Study Design

Elements of a Variable

Variable Format

Variable Name

Variable Label

Value Labels

Valid Codes

Skip Codes for Branching

Data Entry Control Parameters

Missing OK

Auto Advance

Caps Only

Codebook Tools

The Grid

Codebook Libraries

Duplicating Variables

Insert & Delete Variables

Move Variables

Starting Columns

Print a Codebook

Variable Detail Window

Codebook Creation Process

Method 1 - Create a Codebook from Scratch

Method 2 – Create a Codebook from a Word-Processed Document

Spell Check a Codebook

Multiple Response Variables

Missing Data

Changing Information in a Codebook

Overview

Data Input Fields

Form Naming Conventions

Form Creation Process

Using the Codebook to Create a Form

Using a Word-Processed Document to Create a Form

Variable Text Formatting

Field Placement

Value Labels

Variable Separation

Variable Label Indent

Value Labels Indent

Space between Columns

Valid Codes

Skip Codes

Variable Numbers

Variable List and Detail Windows

Data Input Settings

Select a Specific Variable

Finding Text in the Form

Replacing Text in the Form

Saving the Codebook or Workspace

Overview

Keyboard And Mouse Functions

Create A New Data File

Edit Or Add To An Existing Data File

Select A Different Data File

Change Fields

Change Records

Enter A New Data Record

View Data For A Specified Record Number

Find Records That Contain Specified Data

Duplicate A Field From The Previous Record

Delete A Record

Data Input Settings

Compact Data File

Double Entry Verification

Print A Data Record

Variable List & Detail Windows

Data File Format

Overview

HTML Email Surveys

Plain Text Email Surveys

Brackets

Item Numbering

Codebook Design for a Plain Text Email Survey

Capturing a Respondent's Email Address

Filtering Email to a Mailbox

General Considerations for Plain Text Email

Overview

Internet Survey Process

Server Setup

Create the HTML Survey Pages

Upload the Files to the Web server

Test the survey

Download and import the test data

Delete the test data from the server

Conduct the survey

Download and import the data

Display a survey closed message

Server Setup

FTP Login Information

Paths & Folder Information

Design Considerations for Internet Surveys

Special Variables for Internet Surveys

Script to Create the HTML

Command Syntax & Help

Saving and Loading Styles

Survey Generation Procedure

Script Editor

Imbedded HTML Tags

Primary Settings

HTML Name (HTMLName=)

Banner Image(s)  (BannerImage=)

Heading  (Heading=)

Finish Text & Finish URL (FinishText= and FinishURL=)

Cookie (Cookie=)

IP Control (IPControl=)

Allow Cross Site (AllowCrossSite=)

URL to Survey Folder  (WebFolderURL=)

Advanced Settings - Header & Footer

RepeatBannerImage

RepeatHeading

PageNumbers

ContinueButtonText

SubmitButtonText

ProgressBar

FootnoteText & FootnoteURL

Advanced Settings - Finish & Popups

Thanks

Closed

HelpWindowWidth & HelpWindowHeight

HelpLinkText

LinkText

PopupBannerImage

PopupFullScreen

Advanced Settings - Control

Method

Email

RestartSeconds

MaximizeWindow

BreakFrame

AutoAdvance

BranchDelay

Cache

Index

ForceLoaderSubmit

ExtraTallBlankLine

RadioTextPosition

TextBoxTextPosition

LargeTextBoxPosition

LargeTextBoxProgressBar

Advanced Settings - Fonts & Colors

Global Attributes

Heading, Title, Text, & Footnote Attributes

Instructions, Question, and Response Attributes

Advanced Settings - Passwords - Color & Banner Image

LoginBannerImage

LoginBGColor

LoginWallpaper

LoginWindowColor

Advanced Settings - Passwords - Text & Control

PasswordType

LoginText

PasswordText

LoginButtonText

FailText

FailButtonText

ShowLink

EmailMe

KeepLog

Advanced Settings - Passwords - Single vs. Multiple

Password (single password method)

PasswordFile (multiple passwords method)

PasswordField & ID Field (multiple passwords method)

PasswordControl

Advanced Settings - Passwords - Technical Notes

Advanced Settings - Server Overrides

ActionTag

StorageFolder

ScriptFolder

Perl

MailProgram

Branching and Piping

Randomization (Rotations)

Survey Creation Script - Overview

Using Commands More than Once in a Script

Survey Creation - Specify Text

Heading

Title

Text

FootnoteText

Instructions

Question

Survey Creation - Spacing and pagination

BlankLine

NewPage

Survey Creation - Images and Links

Image

Link

Survey Creation - Help Windows

Survey Creation - Popup Windows

Survey Creation - Objects

Radio Buttons for a Single Variable

Radio Buttons for Grouped Variables (matrix style)

DropDown Menu

Adding a TextBox to a Radio Button,
    CheckBox, or Radio Button Matrix

TextBoxes for Grouped Variables

Sliders for Single or Grouped Variables

CheckBox for Multiple Response Variables

ListBox

Uploading and Downloading Files from the Server

Auto Transfer

FTP

Summary of the Most Common Script Commands

Overview

Format of an Email Address File

Extract Email Addresses

List Statistics

Join Two or More Lists

Split a List

Clean, Sort, and Eliminate Duplicates

Add ID Numbers to a List

Create a List of Nonresponders

Subtract One List From Another List

Merge an Email List into a StatPac Data File

Send Email Invitations

Using an ID Number to Track Responses

Email Address File

Body Text File

Sending Email

Overview

Mouse and Keyboard Functions

Designing Analyses

Continuation Lines

Comment Lines

V Numbers

Keywords

Analyses

Variable List

Variable Detail

Find Text

Replace Text

Options

Load, Save, and Merge Procedure Files

Print a Procedure File

Run a Procedure File

Results Editor

Graphics

Table of Contents

Automatically Generate Topline Procedures

Keyword Index

Keywords Overview

Categories of Keywords

Keyword Help

Ordering Keywords

Global and Temporary Keywords

Permanently Change a Codebook and Data File

Backup a Study

STUDY Command

DATA Command

SAVE Command

WRITE Command

MERGE Command

HEADING Command

TITLE Command

FOOTNOTE Command

LABELS Command

OPTIONS Command

SELECT and REJECT Commands

NEW Command

LET Command

STACK Command

RECODE Command

COMPUTE Command

AVERAGE, COUNT and SUM Commands

IF-THEN … ELSE Command

SORT Command

WEIGHT Command

NORMALIZE Command

LAG Command

DIFFERENCE Command

DUMMY Command

RUN Command

REM Command

Reserved Words

Reserved Word RECORD

Reserved Word TOTAL

Reserved Word MEAN

Reserved Word TIME

Analyses Index

Analyses Overview

LIST Command

FREQUENCIES Command

CROSSTABS Command

BANNERS Command

DESCRIPTIVE Command

BREAKDOWN Command

TTEST Command

CORRELATE Command

Advanced Analyses Index

REGRESS Command

STEPWISE Command

LOGIT and PROBIT Commands

PCA Command

FACTOR Command

CLUSTER Command

DISCRIMINANT Command

ANOVA Command

CANONICAL Command

MAP Command

Advanced Analyses Bibliography

Utility Programs

Import and Export

StatPac and Prior Versions of StatPac Gold

Access and Excel

Comma Delimited and Tab Delimited Files

Files Containing Multiple Data Records per Case

Internet Files

Email Surveys

Merging Data Files

Concatenate Data Files

Merge Variables and Data

Aggregate

Codebook

Quick Codebook Creation

Check Codebook and Data

Sampling

Random Number Table

Random Digit Dialing Table

Select Random Records from Data File

Compare Data Files

Conversions

Date Conversions

Currency Conversion

Dichotomous Multiple Response
    Conversion

Statistics Calculator Menu

Distributions Menu

Normal distribution

T distribution

F distribution

Chi-square distribution

Counts Menu

Chi-square test

Fisher's Exact Test

Binomial Test

Poisson Distribution Events Test

Percents Menu

Choosing the Proper Test

One Sample t-Test between Percents

Two Sample t-Test between Percents

Confidence Intervals around a Percent

Means Menu

Mean and Standard Deviation of a Sample

Matched Pairs t-Test between Means

Independent Groups t-Test between Means

Confidence Interval around a Mean

Compare a Sample Mean to a Population Mean

Compare Two Standard Deviations

Compare Three or more Means

Correlation Menu

Sampling Menu

Sample Size for Percents

Sample Size for Means

Utilities

Utility Programs

Seven utility programs are provided to give greater control and more versatility over studies and data files They can be run from the Analysis, Utilities menu.

 

 

The Import and Export program will allow you to read files created by other software or write files that can be read by other software. Several formats are supported:  Access, Excel, all prior versions of StatPac, all prior versions of StatPac Gold, comma delimited, tab delimited, multiple record files, Internet response files, and plain-text e-mail.

The Merge program is used to merge variables from different studies and data files or to rearrange the sequence of variables in a file. It can merge data from up to five individual data files. It can also be used to concatenate (join) data files using the same codebook.

The Aggregate program is used to create a true or compositional aggregate study and data file. Aggregate files are useful for summarizing subgroups of data.

The Codebook program is used to quickly create a codebook, or to check a codebook and data file for errors. The Check program is used when you suspect that there is a problem in the codebook or data file. If a specific procedure won't run, this program can sometimes provide a solution. A common use of this program is when you are planning to use a data file created from a source other than StatPac, and you want to make sure that your study design matches the data file.

The Sampling program is used to generate a random number table, create a random digit dialing table for telephone studies, and to select a random sample from a data file.

The Compare Data Files program is used to compare two data files for differences. It is used to check the accuracy of data entry when a double entry system has been used.

The Statistics Calculator is used to calculate distributions, probabilities and other statistics from proportions and summary data.

 

Import and Export

 

StatPac can import and export information to other software. Select Utilities, and then select import or export. Import means you want to convert a non-StatPac file into StatPac for Windows format. Export means you want to convert a StatPac for Windows file to a different format. When importing or exporting data, the original file(s) will remain intact and a new file(s) will be created.

When importing, select the type of file and name of the file to be imported. If you are importing from previous versions of StatPac, it will be assumed that the codebook/study name and the data file name are the same. The names for the StatPac for Windows codebook and data may be different.

 

Each of the import and export file formats is explained below.

StatPac and Prior Versions of StatPac Gold

Prior versions of StatPac and StatPac Gold used a "codebook" file or "study design" files to store variable format and labels information. StatPac for Windows stores all this information in a codebook file. Because older versions of StatPac have limited labeling space, some labels may be truncated when exporting to an older version of StatPac.

The import program assumes that the data file name is the same as the study file name for the previous version. Thus, the "Name of the File to Import" will indirectly also specify the data file name. For example, if the file to import is SURVEY.EZ0, the program will try to import a data file called SURVEY.DAT from the same folder. If the data file does not exist, only the codebook will be imported. StatPac for Windows stores data in the same format as prior versions of StatPac (fixed format sequential ASCII). Therefore, if there is not a matching data file name, you can simply copy the old data file to the folder where you imported the codebook and use it without modification.

Access and Excel

StatPac can import or export to a variety common data base/worksheet formats. The appropriate extension will be used when you select the type of data base to be imported or exported. When importing to StatPac, the import procedure will create a new StatPac codebook and data file. Do not create a StatPac codebook before doing the import or it will be overwritten by the import procedure.

When importing from Lotus to StatPac, the default variable names are the worksheet column letters (e.g. Column A, Column B, etc.). If your worksheet contains locked column headings, they will be used as variable names for those columns. The column headings may be locked in Lotus by using the /Worksheet Titles Horizontal command. If the column titles are not locked, they will be written as the first record in the StatPac data file (an undesirable situation). Also, be sure your worksheet does not contain any empty rows or dividing lines between the titles and the first row of data.

Comma Delimited and Tab Delimited Files

StatPac can import and export comma and tab delimited files. There are many software packages that can interchange data in this format. A comma delimited file is a sequential ASCII file where the variables are separated from each other by commas (rather than each variable using a fixed number of characters). In a tab delimited file, the separator is a tab character. Tab delimited imports and exports are generally more reliable than comma delimited files.

When importing, StatPac creates a new codebook based on the field widths required to hold the data. If a codebook already exists, you may use it instead of creating a new codebook. If a data file already exists, you'll be offered the option of appending to the existing data file or deleting it. Selecting append will add the newly imported data to the end of the existing data file.

Many software packages write quotation marks around alpha fields, while others do not. When importing a comma delimited file, all quotation marks will automatically be eliminated, since StatPac does not use quotes. When exporting to a comma delimited file, any field containing a comma will always be enclosed in quotes. The StatPac.ini file contains a setting QuoteAlphaFields. When QuoteAlphaFields=1, all alpha fields will be quoted when exporting to comma delimited. When set to zero, only fields containing a comma will be enclosed in quotes.

Many software packages can read or write a header record in comma and tab delimited files. A header record is usually the first record in the data file. It contains the names of the variables instead of actual data. If you are importing a comma or tab delimited file and don't know if there is a header record, load the file into your word processor and look at it. It the first line in the file is the names of the variables, there is a header record. If the first record looks just like the other records, it's data, and not a header record.

When exporting to a tab or comma delimited file, StatPac will give the option to convert the raw data to the value labels. For example. if the first variable were gender and coded 1=Male and 2=Female, a normal export would write a 1 or 2 for the variable. If the Expand To Text option is selected, it would write Male or Female to each data record instead of the raw data.

When exporting to a tab or comma delimited file (or Excel), with the intention of being able to import the file into SPSS, StatPac will give the option to also create a SPSS syntax file. This file is a text file with the same name as the exported file except it will have a .sps extension. In SPSS, first import the data. Then load the .sps file into Notepad or another text editor. Copy the contents of the file into the SPSS syntax editor and click play. This will create the variable and value labels in SPSS. When using this feature, StatPac may modify (abbreviate) the variable names, labels and value labels in order to fit the limited space offered by SPSS.

The tab delimited import utility can be used to import a text file for Verbatim Blaster open-ended response coding. For example, you might have used Microsoft Word to enter verbatim comments into a .txt file. Each person's comments were entered as a paragraph  (i.e., a continuous string of text ending with a carriage return). This file can be imported as a tab delimited file. Since there are actually no tabs in the file, StatPac will correctly import the text into a codebook and data file containing a single variable. The variable will be an alpha type and will be as long as necessary to hold the open-ended comments.

Exported tab delimited files may use a .txt or .tsv (tab separated variables) extension. Exported comma delimited files may use a .txt or .csv (comma separated variables) extensions

When exporting to a tab or comma delimited file, keep in mind that many programs (Excel, Access, etc.) limit the number of columns to 255 (while StatPac can have as many as 2,000). If your codebook has more than 255 variables, an export to an Access file is preferred because it will split the data into multiple tables as necessary.  Otherwise, you’ll have to use the Write command to create a series of codebooks and data files (each containing 255 or fewer variables) and then export each one individually to a delimited file.

Files Containing Multiple Data Records per Case

Many researchers want to use data that is in card-image format on a mainframe computer. Also, many data entry services are capable of only punching data in card-image format. While it is relatively easy to download data from a mainframe, it often comes in 80-column format. If there is only one record per case, this data can be read by StatPac without performing an import. However, when there is more than one "card image" (i.e., record) per case, it becomes necessary to concatenate (join) the "card-image" records together to produce a StatPac readable file.

Importing a multiple record file that looks like this...

 

Card 1 Case 1

Card 2 Case 1

Card 3 Case 1

Card 1 Case 2

Card 2 Case 2

Card 3 Case 2

etc.

will become a StatPac file that looks like this...

 

Card 1 Case 1  Card 2 Case 1  Card 3 Case 1

Card 1 Case 2  Card 2 Case 2  Card 3 Case 2

etc.

StatPac requires that a data record be a continuous stream of characters terminated with a carriage return and linefeed. This program will read a file in multiple record format and create a new data file with one record per case. The filename should have a .txt extension.

StatPac assumes that there are 80 characters in each record of the multiple record file. If the "card-image" record length is less than 80, StatPac will pad the records with spaces before combining them

You will need to specify how many records there are for each case. If the downloaded data file has 3 records per case, you will answer 3 (even if the third "card-image" record is only partially used).

Internet Files

The preferred method of performing Internet surveys is to store the responses in a file on the server. When using the method, responses are stored in ASCII (.asc) text format. When you're ready to perform an analysis, download the file to your local computer using an FTP program or Auto Transfer. If you use a different FTP program, be sure to set it to downloaded the file as an ASCII (not binary) file.

If you use Auto Transfer, the downloaded file will automatically be imported into StatPac. If you manually download the file, you will need to use this import utility to convert the .asc file to StatPac data.

Downloaded Internet response files are not automatically deleted from your server. Therefore, each time you download the responses, it will be the entire set of responses since the beginning of the survey. StatPac will offer you the choice of deleting the existing data file or appending to the end of it. Since the downloaded file is usually the entire data set, you would normally want to replace the existing data file with the newly downloaded data.

Email Surveys

Because of the variety of Email programs, it is not possible to describe the exact steps you must take to import a returned Email survey. Each Email program operates a little differently, and you will need to experiment with your program.

StatPac provides import capabilities for CGI and plain text Email surveys. CGI Email would be produced by a survey placed on a web site that used StatPac's email method of capturing responses. A plain-text survey would be produced by a survey that was simply part of the text in the body of an Email.

Select e-mail as the import type and use the browse button to select the file to be imported. Usually, this would be a .mbx file (i.e., a mailbox in Outlook Express or Eudora where the e-mails were filtered to. Use the browse button to select the existing StatPac codebook and specify the name of the data file. If the data file does not exist, it will be created by the import procedure. If it does exist, the new data will be appended to the end of the existing data file. Finally, select Text as the Email type and click OK. StatPac will advise you if any errors were encountered during the import. If so, the notepad will appear on the menu bar. Click Notepad on your menu bar to see a description of the errors.

Setting Defaults for the Email Import

An e-mail consist of two parts. The first part is the e-mail header and the second part is the contents (or body) of the e-mail. The header contains many lines that are often hidden by e-mail readers, but can be seen by loading an e-mail into the notepad. StatPac must be able to properly identify where the header starts and stops in order to know where the e-mail body begins. The settings in the StatPac.ini file may be adjusted to be compatible with your e-mail reader or language.

The StartEmailHeader and EndEmailHeader settings should be set to the text that begins and ends the header section. The StartEmailHeader parameter should be the text that begins the header section, and the EndEmailHeader parameter should be set to the last Email header line. If you are manually copying and pasting incoming e-mails to mailbox file, it may be important to change these settings. The default values for these parameters are:

 

StartEmailHeader = Return-Path:

EndEmailHeader = X-UIDL:

 

Other e-mail parameters may also be set in the StatPac.ini file.  StartEmailField and EndEmailField can be used to change the brackets from [ and ] to other characters. The EmailPrefix parameter tells StatPac what line contains the name/e-mail address of the respondent. The EmailVarName is the name of the StatPac codebook variable that will automatically capture the respondent's e-mail address in a plain-text e-mail, and the EmailDateField parameter is used to get the date of the e-mail in order to more precisely report which e-mails contained errors. By modifying these parameters, StatPac can be made to work with any e-mail reader or language.

Merging Data Files

 

There are two basic ways that data files can be merged.  The first is called concatenation, and it is used to merge two or more data files that contain the same variables in the same order. The second type of merge lets you join data containing different variables. Select Utilities, Merge, and then the type of merge you want to perform.

 

 

Concatenate Data Files

Many times, several data entry operators will simultaneously enter data into a data file on their own machines. When all the data files have been entered, they can be merged into one large file by concatenating (joining) the data files.

For example, let's say you have three months of data in three separate files (JAN.DAT, FEB.DAT and MAR.DAT). The following DOS command would create a new file called QUARTER1.DAT which contained all three months of data. You could then run your analysis on all the data for the first quarter.

The concatenation-style merge assumes that the codebook(s) for all the data files are exactly the same. The Merge program will let you concatenate any number of data files into a new (larger) data file. You can type the data file names or use the browse button to select data files. Only one data file name should appear per line.

Do not confuse concatenating files with the MERGE utility program. If all your data files reference identical study information (contain the same variables in the same order), use concatenation to merge your data into one file. If your data files, however, contain different variables, use the MERGE utility program.

Merge Variables and Data

The merge program allows you to extract selected variables from up to five studies and create an entirely new study that will be saved on disk. If data files have already been entered for any of the studies, they can also be restructured to match the new study format.

Do not confuse the function of this program with data file concatenation. If two data files have identical formats (i.e., they contain the same variables in the same order), the data files should be merged with the concatenation program.

The restructure and merge program can be used to reorganize a single study (and data file) or to combine several studies (and their associated data files). It allows complete versatility with regard to which variables are selected from each of the studies and the order of the variables.

The program will ask for the name(s) of the codebooks, data files, and common variables that will be utilized. For each specified codebook, also enter the name of the associated data file (if one exists). If no data file is specified for a particular study, the program will use blanks for all variables requested from that study.

 

 

Also select the common variable in each of the studies. This refers to a variable that can be used to match up the records from each data file (e.g., "CASE ID"). If there is not a common variable, it is imperative that the data files contain the same number of records and in the same order. That is, record one from data file one should represent the same respondent (case) as record one from data file two.

If a data record is missing in any of the data files, it could cause data from one file to be matched with the wrong data from another file. Therefore, it is always a good idea to have a common variable in each of the data files (and associated study information) that represents a unique case identification number. All data files must be sorted by this variable before running this program. If one of the data files is missing a particular record, blanks will be merged into the output file.

Click OK to continue. The study numbers and names will be displayed, and the program will request the format for the new study.  The format statement defines the structure of the new codebook.

 

 

The general format for creating a new file structure is:

 

(<Study number>) <Variables> or <Variable range>

 

An example of a format statement is:

 

(1) 1-3,8,4 (3) 2-7 (2) 9 14 (1) 12

 

This statement indicates that the new study format should contain variables in the following order:

 

From study 1 - variables 1, 2, 3, 8 and 4

From study 3 - variables 2, 3, 4, 5, 6 and 7

From study 2 - variables 9 and 14

From study 1 - variable 12

 

Notice that the study number is enclosed in parentheses with no spaces. Individual variables may be separated by either commas or spaces. A range of variables is specified by a dash (minus sign) with no spaces on either side of the dash. If the format statement requires more than one line, just continue typing and word-wrap will correctly break the line

Variables may be specified in any order. The study numbers will be displayed at the top of the screen and are assigned by the computer simply for convenience when specifying the new study format. The individual variable numbers for each codebook can be determined by examining the Variable Names windows

The new study format will be checked for validity before processing begins. If errors are found, you will be asked to re-enter the format. The new study and data file (if specified) will be written.

 

Aggregate

The aggregate utility program creates a new study and data file that consist of aggregate statistics for subgroups of the original data. Any descriptive statistic may be included in the aggregate files. The program allows the creation of both compositional and true aggregate files.

For example, let's say we've distributed a questionnaire to 200 people in each of 50 communities. After performing some preliminary analyses, we want to compare the communities on a number of the interval or ratio-type questions. We could, of course, use the IF-THEN SELECT and WRITE commands to create subfiles for each of the communities and then perform descriptive statistics analyses on each of the subfiles. Obviously, this would be a very time consuming procedure. The aggregate utility program provides a much more efficient way to derive this information.

By using the aggregate program, we could create a new codebook and data file that just contain the descriptive statistics we desire. Each record in the new aggregate file would represent one community. The record would contain the descriptive statistics for the community as a whole (and not the raw data from the original file). Since there are 50 communities, the aggregate file would contain 50 records. This type of aggregate file is called a true aggregate file. It is made up of just the aggregate statistics and does not contain the original data collected. After creating a true aggregate file, the LIST command could be used to print a summary of the descriptive statistic for the communities.

The other type of aggregate file is referred to as a compositional file. Using the same example as above, let's say we want to compare each case in our original file to the descriptive statistic for the community. For example, we might want to compare the individual's age with the mean age in that person's community. In other words, we want each record in the aggregate data file to contain both the original raw data and the descriptive statistic for the community as a whole. The number of records in the compositional aggregate file will contain the same number of records as the original raw data file. However, the aggregate file will contain more variables (the original variables plus the aggregate statistics).

When creating either a true or compositional aggregate file, a new study information file will also be automatically created to match the new aggregate data file.

Before running the aggregate program, the data file must be sorted by the variable that contains the group code. For example, if you plan to create an aggregate file by community, the data file must be sorted by community before running the aggregate utility program. The sort order is not important, however, it is important that all cases from the same community fall together in the file. The aggregate program will accommodate a minimum of 1000 individual groups.

To sort the file, you might use the following procedure:

 

STUDY GOVT

SORT (A) COMMUNITY

SAVE

..

Then run the Aggregate program. It will ask for the codebook name, data file and the variable containing the group code. This refers to the codebook and data file that already exist (not the new aggregate files). The variable containing the group code is the same variable that was used to sort the data file before running this program. In this example, it is the "community" variable. You must also select the type of aggregate file to be created, either compositional or true.

 

 

Click OK to continue. Now you can select the variable(s) for which you want to calculate aggregate statistics. Select the desired variable. Then click on the statistics you want for that variable. Each time you click on a statistic, an aggregate statement will be created in the Aggregate Statement window. Each aggregate statement will create one new aggregate variable.

 

 

When performing a compositional aggregate procedure, the new aggregate variables will be added to the end of each data record. If the study and data file contain 10 variables, and you type two aggregate statements, the new aggregate variables would be added as variables 11 and 12.

When performing a true aggregate procedure, the first variable in the aggregate file will always be the group code (that is, the variable used to determine the groups). Each aggregate statement will produce a statistic that is added as the next variable in the file. The first aggregate statement would create variable two, the next variable three, and so forth.

Aggregate statistics can only be calculated for numeric-type variables. There is one exception to this rule:  If the variable used to split the data file into groups is alpha, you may still calculate the number of valid cases. In our example, if community were coded alpha, it would be acceptable to ask for the number of valid cases (statistic 17) for this variable.

Each aggregate statement you enter will create a new variable in the aggregate file. After entering all the aggregate statements, click OK. A new codebook will be created. The new variable labels in this study will include both the original labels and the types of statistics. After the new study has been created, the program will perform all the aggregate calculations and write the new data file.

Because many calculations are involved in creating an aggregate file, the program may take some time to finish. It will display a message informing you of successful completion.

If any statistic cannot be calculated, or if there are an insufficient number of columns to hold the aggregate statistic, the output file will contain spaces for that variable. For example, if you requested the mode, and the group was multi-modal, the aggregate statistic would be stored as blanks.

 

Codebook

There are two utility programs for codebooks. The Quick Codebook Creation utility  creates an entire codebook using a single FORTRAN-like statement. The Check Codebook & Data utility is used to verify the integrity of the codebook and to fix errors in the  file.

Quick Codebook Creation

The fastest way to create a codebook is to use the Quick Codebook Creation program. However, this will create a "barebones" codebook consisting of only the format for each variable. In most cases, you’ll want to use the Grid or Variable Detail window to create a new codebook.

Select Analysis, Utilities, Codebook, Quick Codebook Creation. You will need to enter a file name for the new codebook and a format statement. This is essentially a data definition statement and is similar to a FORTRAN style format statement.

 

 

The Format Statement defines the number and type of variables that will be in the new study.  It is the combination of all the individual variable formats.  Using the format statement can save considerable time if variable and value labels are not required, or if you plan to use a fixed format data file from another source.

The syntax for each component of a format statement is:

<No. of Vars.> <Var. Type> <No. of Cols> . <Decimals>

<No. of Vars.> is the number of consecutive variables that use the format defined by the next three parameters.  If this component of the format statement is omitted, the default is one.

<Var. Type> is always A or N and refers to whether the variable(s) are alpha or numeric.  StatPac automatically left justifies alpha variables and right justifies numeric variables.

<No. of Cols> is the field width allocated for the variables(s).  This is the total field width for the variable(s) and it must be large enough to hold a plus or minus sign and a decimal point if necessary.

. <Decimals> is the number of significant decimal places that the variable(s) will contain.  This component of the format statement is optional and may be omitted.  If <decimals> is not specified, the data will be stored exactly as entered (with or without a decimal point).

 

Examples of Format Statements

 

1N5

creates 1 numeric variable using 5 columns

N5

creates 1 numeric variable using 5 columns

12N3

creates 12 numeric variables each using 3 columns

N5.2

creates 1 numeric variable using 5 columns the format of the variable will be ##.##

7N2.0

creates 7 numeric variables each using 2 columns; the format of the variables will be ## (always rounded to an integer)

A1

creates 1 alpha variable using 1 column

2A35

creates 2 alpha variables each using 35 columns

5N4 2A1 3N7.2

creates a study with 10 variables.
1-5 are numeric each using 4 columns,
6-7 are alpha using 1 column each,
8-10 are numeric using 7 columns each
with 2 significant decimal places

Check Codebook and Data

This utility program will verify the integrity of a codebook and data file. If errors are found the program will attempt to fix them. If you have created a codebook to match a foreign data file (one created by a program other than StatPac), use this program to make sure that the data record lengths match the codebook you created.

Select the codebook and data file to be checked and click OK. If the program corrects any errors, they will be listed in the notepad. 

 

 

Sampling

The Sampling program is used to generate a random number table, create a random digit dialing table for telephone studies, and to select a random sample from a data file.

 

Random Number Table

When planning to conduct a survey, choosing the sample is just as important as the survey itself. If the sample is incorrectly chosen, any results are likely to be distorted. That is, the characteristics of the sample will not represent the characteristics of the population.

One of the best ways to choose a sample is to use a random sampling technique. If the sample is randomly chosen from the population, it will represent the population. That is, characteristics of the sample are likely to be found in similar proportions in the population.

The classical method of selecting the sample is to give each case in the population a number and then randomly select numbers until the sample size is achieved. The second function of this program is to print a random number table.

 

 

You should first select whether the numbers should be selected with or without replacement. When replacement is used, a number may be selected more than once (selection does not eliminate it from being available for future selection). When random numbers are selected without replacement, the selection of a number eliminates it from the pool of available numbers. The algorithm used for selection without replacement will display the random numbers in sequential order.

Enter the number of random numbers you want to be printed. This relates to the sample size determined with the Statistics Calculator. Be sure to add a sufficient number to the ideal sample size to accommodate a pilot test and replacement of nonresponders (if part of your study design).

Enter the smallest allowable random number and the largest allowable random numbers. Typically, the lowest value would be one and the highest value would be equal to the number of cases in the population.

Enter the name of the StatPac codebook and data file to store the random numbers and click OK. A StatPac codebook and data file will be created that contains one variable called "RANDOM".  Finally, the random numbers will be displayed in a compressed format in the Notepad. You do not need to save them with Notepad since they are already stored in a StatPac data file.

Random Digit Dialing Table

Telephone surveys sometimes use random digit dialing to secure the sample. While this method will result in many non-working or non-voice numbers, it will produce a random sample of people who have telephones. Since local prefix codes are set (i.e., predefined by the phone company), only the last four digits of a phone number can be randomly selected.  The random number method of creating a telephone file allows you to specify a series of local prefix codes and the number of random telephone numbers you want created for each prefix code.

There is an important consideration to keep in mind when creating a random digit file.  Many of the random numbers will not be useful.  For example, a number may be non-existent, a business office, or a fax or computer line.  There are several algorithms for maximizing the number of home phone numbers, however, these techniques have generally produced poor results and are not included in StatPac.  Therefore, it is usually a good idea to select more phone numbers than you actually need.

The random number utility program allows you to specify any number of prefixes and to specify how many numbers you want from each prefix.  For local surveys, the prefix will be three digits (the local exchange); for long distance surveys, the prefix will be seven digits (i.e., 1 + three digits for the area code + three digits for the local exchange).

 

 

In the Local Exchange examples on the screen display, 50 numbers would be created with a 929 prefix and 35 numbers would be created with a 987 prefix. For the Long Distance examples, 25 numbers would be created that begin with 1-612-925 and 50 numbers would be created that begin with 1-807-927.

After you have finished typing the prefixes and quantities, click OK to create the phone number file. A StatPac codebook and data file will be created that contains one variable "TELEPHONE_NUMBER".  Finally, the random numbers will be displayed in a compressed format in the Notepad. You do not need to save them with Notepad since they are already stored in a StatPac data file.

The actual technique used to create the file is called random number selection without replacement.  This means that as a phone number is selected, it will be eliminated from the pool of available numbers for the next selection.  This eliminates the possibility of selecting the same number (with the same prefix) twice.

Depending on the number of prefixes and the quantities from each prefix, the actual creation of the file may take a little while.  Please be patient; the program will inform you when the sample selection has been completed.

Select Random Records from Data File

With this utility, you can select a specified number of random records from a data file and write them to a new data file. If you have a very large data base and a long procedure file, you might use this utility to create a shorter data file, and perform a test run of the procedure file on it.

Enter for the name of the existing data file, the new data file, and the number of records to be selected and written to the new data file.

 

 

 

Compare Data Files

Many data entry operators use a double entry method of data verification.  Data is entered into one data file and the same data is re-entered into another data file.  The two data files are then compared for differences.

The purpose of this utility program is to identify possible errors in the data; it does not have any editing features.

 

 

Enter the name of the StatPac codebook and the names of the two data files to be compared.  The data files should contain the same number of records in the same order.

Upon completion, the total number of errors will be reported. If differences are found, the record numbers and which variables are different will be shown in the Notepad. Use the notepad to print the errors listing

 

Conversions

StatPac supports only two data types, alpha and numeric. This can make it difficult to work with dates and currency variables. These utilities simplify the task of working with date and currency variables.

The conversion utilities read an existing codebook and data file, and create a new codebook and data file with a new converted variable(s). The original date or currency variable is not modified and will remain “as is” in the codebook and data. Instead, a new variable (the converted field) is created and added to the end of the codebook and data.

The conversion utilities also offer a way to change dichotomous multiple response variables to the multiple response format required by StatPac.

Date Conversions

The most common functions with dates are sorting and selecting. Typically, a user would create an alpha variable for a date variable because it contains non-numeric characters such as slashes or dashes. Regardless of the format, sorting by date or selecting the records between two dates can be difficult unless the date can be readily converted to a numeric eight-column (N8) variable in the format YYYYMMDD.

 

 

The first function will take one or more date variables in any format and create new N8 variable(s) in YYYYMMDD format. The new N8 variable(s) can be used with the Sort command to sort a file by date. It can also be used with the Select command to select a range of dates.

The second function will calculate the number of days between two dates. The two dates can be any date format and the new variable (number of days) will be an N5 format. The absolute value of the difference between the two dates will be calculated and added to the end of the new codebook and data file.

The third function will create an English text version of a date in “D Mon, YYYY” format (e.g., 5 Oct 2005). The purpose is to make it possible for the user to subsequently use the List command to create an easily readable listing of the data.

Currency Conversion

The currency conversion utility is useful for adding or removing the $ or £ symbols, interpreting a K or M suffix, and removing commas from currency fields.

When conducting internet surveys (where the respondent is entering their own response) currency fields can create problems. You can require numeric input but that is often frustrating for respondents who want to enter something like 50K or 10M or $25,000. If you believe respondents will want to enter anything other than a number, you can specify the field as alpha in the codebook (which will accept any input from the respondent). After the survey is closed, use this utility to convert the data to a numeric field.

The CurrencySymbol setting in the defaults (StatPac.ini) file can be set to your country’s currency symbol. When converting the alpha field to a number, commas will be removed, the letter K will multiple the value times a thousand, and the letter M will multiple the value times a million.

 

 

Dichotomous Multiple Response Conversion

The dichotomous multiple response conversion utility is useful when you have imported data from an external source that coded multiple response variables in a dichotomous format.

For example, data in the external file might be coded as ones and blanks, where a one means the respondent selected the attribute and blank means they didn't.

Assume the question was "What are your favorite colors?" Imported data might look like this:

 

 

V1

V2

V3

V4

Respondent

Orange

Blue

Yellow

Red

1

 

1

 

1

2

1

 

 

1

3

 

1

1

 

 

After importing, you could write a procedure to convert the data to the multiple response format used by StatPac. It would look like this:

 

Labels V1=What are your favorite colors?

Labels V2=What are your favorite colors?

Labels V3=What are your favorite colors?

Labels V4=What are your favorite colors?

Labels V1-V4 (1=Orange)(2=Blue)(3=Yellow)(4=Red)

Recode V2 (1=2)

Recode V3 (1=3)

Recode V4 (1=4)

Frequencies V1-V4

Options MR=Y

..

 

This will work fine although it is cumbersome. When there are ten or more variables in the multiple response group, it becomes more difficult because the imported variables are likely coded as N1, while the StatPac variables need to be coded as N2.

Two methods are incorporated into StatPac to deal with imported data that use dichotomous multiple response.

The first method is in the frequencies program itself. The MX=Code option can be added to the frequency program to tell StatPac that the variables are dichotomous. Then "Code" is the single character value that indicates the item is selected. In the above example, the data was coded as ones and blanks, so MX would be set to 1. If the data had been coded as Y and N, then MX would be set to Y.

 

Frequencies V1-V4

Options MR=Y MX=1

..

 

Using this method does not actually change the data file. StatPac just reads the data differently for the frequencies procedure. An exclamation mark cannot be used to permanently set the MX option. It must be explicitly specified in each procedure where you want to use it.

The other method is to actually convert the data file to the format used by StatPac for multiple response. If you plan to do banners or other procedures that utilize the dichotomous multiple response variables, then it is best to permanently alter the data. After conversion, the above data set would look like this:

 

 

V1

V2

V3

V4

Respondent

Orange

Blue

Yellow

Red

1

 

2

 

4

2

1

 

 

4

3

 

2

3

 

 

The conversion utility lets you select several sets of variables that are dichotomous multiple response, however they are done one at a time. The first screen lets you select the codebook and data file, and specify a name for the new (converted) codebook and data file. After selecting the codebook, the variable names will appear so they can be selected.

 

 

After selecting the variables that make up the first multiple response group, click the plus button to add them to the conversion list.

The second screen lets you set the code and labeling for the selected group of variables.

 

 

The code is the dichotomous value that indicates the item is selected.

Since the imported data doesn't have a single variable name for the group of variables, StatPac names them MR_Group_A, MR_Group_B, etc. After the conversion, the converted variables in the group will be named using the _x convention (e.g., MR_Group_A_1, MR_Group_A_2, MR_Group_A_3, and MR_Group_A_4). Thus, you might want to change the variable name to something more meaningful, For example, if you changed the name to Color, the converted variables would be named Color_1, Color_2, Color_3, and Color_4.

Similarly, the variable label might be changed to the actual question. All the converted variables will use that variable label. You could change "Multiple Response Group A: V1-V4" to "What are your favorite colors?"

After you are satisfied with the conversion labeling, click the Convert button. This will return you to the first screen where you can select an additional set of multiple response variables and click the plus button to add them to the conversion.

After you have finished selecting all the groups of multiple response variables, click OK to perform the conversion. The new codebook and new data file will then contain the multiple response variables in StatPac format.