BioMedStat

 

Version 1.0

 

User Guide

 

 

© 2004-2005 Peterson Software Lab

 

Baylor College of Medicine

 

 

(Invention Disclosure: "BioMedStat: Computer Program for Statistical Analysis of Biomedical Data", NIH Disclosure No. (EIR#):0481201-05-0013, Baylor Reference No./Disclosure No.: BLG#05-088.  For commercial licensing, see the License Agreement.)

 

 

Table of Contents

Input File Format

Tab-delimited Text Files with Variable Names

Tab-delimited Text Files with Variable and Record Names

Example 1 - Opening a Tab-Delimited Text File

Example 2 [!DSTAG SYMBOL ID=ch0001 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Summary statistics of several variables

Example 3 [!DSTAG SYMBOL ID=ch0002 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Recode a continuous variable into quartiles, and then recode into 4 new indicator variables

Example 4 [!DSTAG SYMBOL ID=ch0003 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Parametric 2-sample paired T-test

Example 5 [!DSTAG SYMBOL ID=ch0004 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Non-parametric 2-sample paired Wilcoxin signed rank test

Example 6 [!DSTAG SYMBOL ID=ch0005 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Parametric independent 2-sample t-test

Example 7 [!DSTAG SYMBOL ID=ch0006 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Non-parametric independent 2-sample Mann-Whitney U Test

Example 8 [!DSTAG SYMBOL ID=ch0007 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Parametric independent k-sample Analysis of Variance

Example 9 [!DSTAG SYMBOL ID=ch0008 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Non-parametric independent k-sample Kruskal-Wallis Test

Example 10 [!DSTAG SYMBOL ID=ch0009 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Pearson product moment and Spearman rank correlation

Example 11 [!DSTAG SYMBOL ID=ch0010 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Covariance analysis

Example 12 [!DSTAG SYMBOL ID=ch0011 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Chi-square 2-way contingency table analysis

Example 13 [!DSTAG SYMBOL ID=ch0012 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Linear regression

Example 13 [!DSTAG SYMBOL ID=ch0013 DIMS='[[5,1,-3,-1,-1],[7,1,-4,-1,-1],[9,1,-5,-1,-1],[],[],[],[23,5,-11,-3,-3]]' CHARID=ch0 INSUBSUPER=0/] Logistic regression (unconditional)

 

Input File Format

 

Tab-delimited Text Files with Variable Names

 

Tab-delimited text files are comprised of data separated (i.e., delimited) with a character tab.   Tab-delimited text files used for data analysis typically have variable names in the first row of the file.  The simplest way to generate a tab-delimited text file is by specifying “tab-delimited text” as a file save option in Microsoft Excel.  The example below shows the setup necessary for saving data and variables names to a tab-delimited text file in Excel.

 

   

 

To save the data listed above, select File, then Save As, then select Tab delimited (txt), as in:

 

 

Next, in the Save As window, specify the file name as “protein_data”:

 

 

 

You will notice two popup windows, the first of which states that multiple sheets cannot be saved in the file, so click OK:

 

 

 

 

And the second which states that you can lose special features of Excel when saving data into a tab-delimited text file.   For example, bold fonts and colors cannot be saved with the data, thus, click Yes:

 

 

 

Next, open the “protein_data.txt” file just saved, and the following will appear:

 

 

 

 

Tab-delimited Text Files with Variable and Record Names

 

The following illustrates a file setup with both variable and record names.   For this example, the variable names are the sample (patient) identifiers and the record names in the last column are the genes or proteins.   This setup is commonly used for DNA microarray data, however, BioMedStat was designed for clinical statistical analysis for clinical data in which records represent patients and columns represent variables or measurements made on the experimental units (patients).  (If you want to analyze microarray data, then use the ChipST2C program).   The example is shown below:

 

 

To continue, after the data above were saved as a tab-delimited text file and opened, the following format will be observed:

 

 

 

 

 

Example 1 - Opening a Tab-Delimited Text File

When BioMedStat is installed, the iconis placed on the Desktop with a shortcut link to the BioMedStat program.   The desktop shortcut appearance is as follows:

 

 

(Note that in some cases, BioMedStat may be installed without a desktop shortcut requiring program startup by on the Start, then Programs, then BioMedStat.)

 

To start BioMedStat, double-click on the desktop icon.   You will then see a splash screen:

 

 

 

Next click on the Start button, and the BioMedStat application will then be visible, as shown below:

 

 

 

To open a file, select File, Open, Text data, as shown below

 

 

 

and the Input format popup window will appear:

 

Since tab-delimited is the default text file format it won’t need to be specified, however, check the “Variable names in first row” option and click OK.   In the Open popup window, you should see the text files that were installed in the c:\Program Files\BioMedStat\BioMedStat\ directory, as shown below:

 

 

Select the Hosmer & Lemeshow low birth weight file, and then click on Open (or double-click the filename).  When the file is read into BioMedStat, the status text filed and progressbar on the log tab will indicate the amount of data read and copied into the data viewing spreadsheet.  When completed, the data viewing tab will show the data that were read in.  (Note: the spreadsheet in the Data tab is only used for viewing data and not editing, i.e., changing, copying, pasting, etc.)

 

 

 

 

Example 2 [!DSTAG SYMBOL ID=ch0014 DIMS='[[9,1,-5,-1,-1],[11,3,-5,-2,-2],[14,3,-7,-2,-2],[],[],[],[37,9,-17,-5,-5]]' CHARID=ch1 INSUBSUPER=0/] Summary statistics of several variables

 

Data used for this example were published in the book Hosmer, D.W. and Lemeshow, S. Applied Logistic Regression, New York, Wiley (1989) and are available form the University of Massachussetts (Amherst) Statistical Software Information Internet resources at URL http://people.umass.edu/statdata/statdata/.   These data are copyrighted by John Wiley & Sons Inc. and must be acknowledged and used accordingly. Data were collected at Baystate Medical Center, Springfield, Massachusetts during 1986.  

 

The data set is comprised of 189 observations for 11 variables (risk factors) associated with giving birth to a low birth weight baby (weighing less than 2500 grams).  Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies.  Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy.

 

Columns

Variable

Abbreviation

 

 

 

2-4

Identification Code

ID

 

 

 

10

Low Birth Weight (0 = Birth Weight >= 2500g,

LOW

 

1 = Birth Weight < 2500g)

 

 

 

 

17-18

Age of the Mother in Years

AGE

 

 

 

23-25

Weight in Pounds at the Last Menstrual Period

LWT

 

 

 

32

Race (1 = White, 2 = Black, 3 = Other)

RACE

 

 

 

40

Smoking Status During Pregnancy (1 = Yes, 0 = No)

SMOKE

 

 

 

48

History of Premature Labor (0 = None  1 = One, etc.)

PTL

 

 

 

55

History of Hypertension (1 = Yes, 0 = No)

HT

 

 

 

61

Presence of Uterine Irritability (1 = Yes, 0 = No)

UI

 

 

 

67

Number of Physician Visits During the First Trimester

FTV

 

(0 = None, 1 = One, 2 = Two, etc.)

 

 

 

 

73-76

Birth Weight in Grams

BWT

 

 

Background information for the data:

       

  • Low birth weight is an outcome that has been of concern to physicians for years.
  • Infant mortality rates and birth defect rates are very high for low birth weight babies
  • A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight

       

The risk factors described above have been shown to be associated with low birth weight in the obstetrical literature. The goal of the Hosmer & Lemeshow study was to ascertain if these variables were important in the population being served by the medical center where the data were collected.

 

 

 

 

Open the tab-delimited text file for the Hosmer & Lemeshow low birth weight data (see Example 1).   Select the Summary statistics command of the Analyze pull-down menu, shown as

 

 

Notice that the Variables tab opens, shown as follows:

 

 

Select all but the first variable (ID number) in the following manner:

 

 

Then click on the  button to add the variables to the list of selected variables:

 

 

 

and finally, click on the  button.  The Treeview on the left will show a number of icons in the form:

 

   

 

To view the summary statistics for each variable, click on the  icon, and the text output containing summary statistics for the specified variables will become visible:

 

 

To observe the frequency histogram of age, click on the  icon and the following histogram will appear:

 

 

 

 

 

 

 

 

 

 

Example 3 [!DSTAG SYMBOL ID=ch0015 DIMS='[[9,1,-5,-1,-1],[11,3,-5,-2,-2],[14,3,-7,-2,-2],[],[],[],[37,9,-17,-5,-5]]' CHARID=ch1 INSUBSUPER=0/] Recode a continuous variable into quartiles, and then recode into 4 new indicator variables

 

Open the Hosmer & Lemeshow low birth weight data (see Example 1), select the Variables tab and then select (highlight) the “age” variable as:

 

 

Next, at the bottom of the Variables tab are several buttons for transforming the values of variables into new values:

 

 

Now that the “age” variable is selected in the variable list, click on the  button, and a new (ordinally ranked) categorical variable called “age_(quartiles)” will be generated and shown in the variable list.  At run-time, the cutpoints for quartiles are computed (to see their values, run the summary statistics option on the age variable) and the resulting values for the new variable are 1, 2, 3, 4, representing the age quartile that each record (patients) is assigned to.   Select the new variable (“age_(quartiles)”) in the fashion

 

 

 

and then click on the  button.  Upon completion of the transformation, there will be four new indicator or “dummy” variables added to the list, appearing as:

 

 

The four indicator variables have values of 0 or 1 depending on whether or not a patient’s age falls within the given quartile represented by the variable.  If you run summary statistics on the age_(quartiles) variable, the following histogram will be generated:

 

 

which clearly shows that the variable takes on values of 1, 2, 3, or 4 at various frequencies of occurrence.

 

 

Example 4 [!DSTAG SYMBOL ID=ch0016 DIMS='[[9,1,-5,-1,-1],[11,3,-5,-2,-2],[14,3,-7,-2,-2],[],[],[],[37,9,-17,-5,-5]]' CHARID=ch1 INSUBSUPER=0/] Parametric 2-sample paired T-test

 

Let us consider the example listed on page 77 of Siegel (Siegel, S.,  Non-parametric Statistics for the Behavioral Sciences, New York, McGraw-Hill, 1956) involving matched data on social perceptiveness scores from identical twin pairs.   The hypothesized experiment for this example tests whether or not there is a difference between social perceptiveness scores between twins after one of the twins attends a single term of nursery school and the other stays home.   The data are listed below:

 

 

 

 

Identical twin pair

Social perceptiveness score of twin in nursery school

Social perceptiveness score of twin at home

a

82

63

b

69

42

c

73

74

d

43

37

e

58

51

f

56

43

g

76

80

h

65

82

 

 

Let’s use a “paired T-test” to determine if the average social perceptiveness scores are the same between the twin pairs.   Start BioMedStat, and first specify the Open command of the File pull-down menu, then Text, and then specify variable names in first row as shown below:

 

 

 

 

Next, select the file Siegel_table_5_6.txt, shown as:

 

 

 

and you will see the data in the viewing spreadsheet as follows:

 

 

 

 

 

 

To begin the analysis, select the “T-tests (paired samples)” command of the 2-sample command of the Analyze pull-down menu as follows:   

 

 

 

Specify the nursery variable as Variable 1, and home variable as Variable 2 (shown below) and then click on Run. 

 

 

 

And the resulting icons and table (after clicking “nursery” icon) will appear as:

 

 

 

The test statistic of 1.2789 for a two-tailed test is compared with a tabled critical value of t0.05; 7=2.365, so the decision rule is to accept the null hypothesis that there is no difference between the average social perceptiveness score among matched pairs of twins.   The relevant mathematical formulae are given as:

 

 

T-Test (Paired Samples)

 

nursery

home

Sample Size (n_1, n_2):

8

8

Average [avg(x), avg(y)]:

65.2500

59.0000

Variance [var(x), var(y)]:

157.6429

329.1429

Numerator: avg(d_i)

6.2500

[!DSTAG EQN ID=eq0001 ENDOFPARA=1 RESERVE_LINE_SPACING=1 DIMS='[[6,9,0,-1,-1],[8,11,0,-1,-1],[9,14,0,-1,-1],[8,13,0,-1,-1],[10,17,0,-1,-1],[13,21,0,-1,-1],[23,37,0,-2,-2]]' OPTIONS=3/]

s.d.(d_i)

13.8229

[!DSTAG EQN ID=eq0002 ENDOFPARA=1 RESERVE_LINE_SPACING=1 DIMS='[[10,7,3,-1,-1],[11,10,4,-1,-1],[14,13,6,-1,-1],[13,11,5,-1,-1],[17,15,7,-1,-1],[21,19,8,-1,-1],[36,32,14,-2,-2]]' OPTIONS=3/]

Denominator

4.8871

[!DSTAG EQN ID=eq0003 ENDOFPARA=1 RESERVE_LINE_SPACING=1 DIMS='[[97,12,3,-1,-1],[129,16,4,-1,-1],[161,20,5,-1,-1],[146,19,5,-1,-1],[194,25,6,-1,-1],[242,31,8,-1,-1],[404,52,14,-2,-2]]' OPTIONS=3/]

Test Statistic*

1.2789

[!DSTAG EQN ID=eq0004 ENDOFPARA=1 RESERVE_LINE_SPACING=1 DIMS='[[75,26,11,-1,-1],[99,35,15,-1,-1],[124,44,18,-1,-1],[111,39,17,-1,-1],[148,53,22,-1,-1],[186,66,28,-1,-1],[310,111,46,-2,-2]]' OPTIONS=3/]

d.f.

7

[!DSTAG EQN ID=eq0005 ENDOFPARA=1 RESERVE_LINE_SPACING=1 DIMS='[[55,10,3,-1,-1],[73,14,5,-1,-1],[91,17,6,-1,-1],[81,16,6,-1,-1],[109,21,7,-1,-1],[136,25,9,-1,-1],[226,42,15,-2,-2]]' OPTIONS=3/]

P-value

0.2456

 

Mean permutation t

-0.02

 

Permutation p-value

0.2423

 

Iterations

9999

 

*Daniel, W.W.  Paired Comparisons (Sect. 6.4).  In: Biostatistics:  A Foundation

for Analysis in the Health Sciences (5th Edition).   New York, Wiley (1991).

 

 

 

While the SPSS (12) benchmark results are as follows:

 

 

                                                Paired Samples Statistics

 

 

 

Mean

N

Std. Deviation

Std. Error Mean

Pair 1

nursery

65.250

8

12.5556

4.4391

home

59.000

8

18.1423

6.4143

 

                                                                                                               Paired Samples Test

 

 

Paired Differences

 

 

 

 

Mean

Std. Deviation

Std. Error Mean

95% Confidence Interval of the Difference

 

 

 

 

 

 

 

Lower

Upper

t

df

Sig. (2-tailed)

Pair 1

nursery - home

6.2500

13.8229

4.8871

-5.3062

17.8062

1.279

7

.242

 

 

 

 

 

Example 5 [!DSTAG SYMBOL ID=ch0017 DIMS='[[9,1,-5,-1,-1],[11,3,-5,-2,-2],[14,3,-7,-2,-2],[],[],[],[37,9,-17,-5,-5]]' CHARID=ch1 INSUBSUPER=0/] Non-parametric 2-sample paired Wilcoxin signed rank test

 

This example will also use an aggressiveness score among 12 sets of identical twins, given on page 283 of Conover, W.J. The Wilcoxin Signed Rank Test. In: Practical Non-Parametric Statistics (2nd Edition). New York, Wiley (1980).   Start BioMedStat, and first specify the Open command of the File pull-down menu, then Text, and then specify variable names in first row as shown below:

 

 

 

 

Next, select the file “conover_page_283_wilcoxin.txt, shown as:

 

 

 

 

and you will see the data in the viewing spreadsheet as follows:

 

 

 

<