BioMedStat
Version 1.0
User Guide
© 2004-2005 Peterson Software Lab
(Invention Disclosure:
"BioMedStat: Computer Program for Statistical Analysis of Biomedical
Data", NIH Disclosure No. (EIR#):0481201-05-0013, Baylor Reference
No./Disclosure No.: BLG#05-088. For
commercial licensing, see the License
Agreement.)
Table of Contents
Tab-delimited Text Files with
Variable Names
Tab-delimited Text Files with
Variable and Record Names
Example 1 - Opening a Tab-Delimited Text File
Example 2
Summary statistics of several variables
Example 4
Parametric 2-sample paired T-test
Example 5
Non-parametric 2-sample paired Wilcoxin signed
rank test
Example 6
Parametric independent 2-sample t-test
Example 7
Non-parametric independent 2-sample
Mann-Whitney U Test
Example 8
Parametric independent k-sample Analysis of
Variance
Example 9
Non-parametric independent k-sample
Kruskal-Wallis Test
Example 10
Pearson product moment and Spearman rank
correlation
Example 11
Covariance analysis
Example 12
Chi-square 2-way contingency table analysis
Example 13
Logistic regression (unconditional)
Tab-delimited text files are comprised of data separated (i.e., delimited) with a character tab. Tab-delimited text files used for data analysis typically have variable names in the first row of the file. The simplest way to generate a tab-delimited text file is by specifying “tab-delimited text” as a file save option in Microsoft Excel. The example below shows the setup necessary for saving data and variables names to a tab-delimited text file in Excel.
To save the data listed above, select File, then Save As, then select Tab delimited (txt), as in:

Next, in the Save As window, specify the file name as “protein_data”:

You will notice two popup windows, the first of which states that multiple sheets cannot be saved in the file, so click OK:

And the second which states that you can lose special features of Excel when saving data into a tab-delimited text file. For example, bold fonts and colors cannot be saved with the data, thus, click Yes:

Next, open the “protein_data.txt” file just saved, and the following will appear:

The following illustrates a file setup with both variable and record names. For this example, the variable names are the sample (patient) identifiers and the record names in the last column are the genes or proteins. This setup is commonly used for DNA microarray data, however, BioMedStat was designed for clinical statistical analysis for clinical data in which records represent patients and columns represent variables or measurements made on the experimental units (patients). (If you want to analyze microarray data, then use the ChipST2C program). The example is shown below:

To continue, after the data above were saved as a tab-delimited text file and opened, the following format will be observed:
When BioMedStat is installed, the icon
is
placed on the Desktop with a shortcut link to the BioMedStat program. The desktop shortcut appearance is as
follows:

(Note that in some cases, BioMedStat may be installed without a desktop shortcut requiring program startup by on the Start, then Programs, then BioMedStat.)
To start BioMedStat, double-click on the desktop icon. You will then see a splash screen:
Next click on the Start button, and the BioMedStat application will then be visible, as shown below:

To open a file, select File, Open, Text data, as shown below

and the Input format popup window will appear:

Since tab-delimited is the default text file format it won’t need to be specified, however, check the “Variable names in first row” option and click OK. In the Open popup window, you should see the text files that were installed in the c:\Program Files\BioMedStat\BioMedStat\ directory, as shown below:

Select the Hosmer & Lemeshow low birth weight file, and then click on Open (or double-click the filename). When the file is read into BioMedStat, the status text filed and progressbar on the log tab will indicate the amount of data read and copied into the data viewing spreadsheet. When completed, the data viewing tab will show the data that were read in. (Note: the spreadsheet in the Data tab is only used for viewing data and not editing, i.e., changing, copying, pasting, etc.)

Data used for this example were published in the book
Hosmer, D.W. and Lemeshow, S. Applied
Logistic Regression, New York, Wiley (1989) and are available form the
University of Massachussetts (Amherst) Statistical Software Information
Internet resources at URL http://people.umass.edu/statdata/statdata/. These data are copyrighted by John Wiley
& Sons Inc. and must be acknowledged and used accordingly. Data were
collected at
The data set is comprised of 189 observations for 11 variables (risk factors) associated with giving birth to a low birth weight baby (weighing less than 2500 grams). Data were collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Four variables which were thought to be of importance were age, weight of the subject at her last menstrual period, race, and the number of physician visits during the first trimester of pregnancy.
|
Columns |
Variable |
Abbreviation |
|
|
|
|
|
2-4 |
Identification
Code |
ID |
|
|
|
|
|
10 |
Low Birth Weight
(0 = Birth Weight >= 2500g, |
LOW |
|
|
1 = Birth Weight
< 2500g) |
|
|
|
|
|
|
17-18 |
Age of the Mother
in Years |
AGE |
|
|
|
|
|
23-25 |
Weight in Pounds
at the Last Menstrual Period |
LWT |
|
|
|
|
|
32 |
Race (1 = White,
2 = Black, 3 = Other) |
RACE |
|
|
|
|
|
40 |
Smoking Status
During Pregnancy (1 = Yes, 0 = No) |
SMOKE |
|
|
|
|
|
48 |
History of
Premature Labor (0 = None 1 = One,
etc.) |
PTL |
|
|
|
|
|
55 |
History of
Hypertension (1 = Yes, 0 = No) |
HT |
|
|
|
|
|
61 |
Presence of
Uterine Irritability (1 = Yes, 0 = No) |
UI |
|
|
|
|
|
67 |
Number of
Physician Visits During the First Trimester |
FTV |
|
|
(0 = None, 1 =
One, 2 = Two, etc.) |
|
|
|
|
|
|
73-76 |
Birth Weight in
Grams |
BWT |
Background information for the data:
The risk factors described above have been shown to be associated with low birth weight in the obstetrical literature. The goal of the Hosmer & Lemeshow study was to ascertain if these variables were important in the population being served by the medical center where the data were collected.
Open the tab-delimited text file for the Hosmer & Lemeshow low birth weight data (see Example 1). Select the Summary statistics command of the Analyze pull-down menu, shown as

Notice that the Variables tab opens, shown as follows:

Select all but the first variable (ID number) in the following manner:

Then click on the
button to add the variables to the list of
selected variables:
and finally, click on the
button.
The Treeview on the left will show a number of icons in the form:
To view the summary statistics for each variable, click on
the
icon, and the text output containing summary
statistics for the specified variables will become visible:

To observe the frequency histogram of age, click on the
icon and the following histogram will appear:

Open the Hosmer & Lemeshow low birth weight data (see Example 1), select the Variables tab and then select (highlight) the “age” variable as:

Next, at the bottom of the Variables tab are several buttons for transforming the values of variables into new values:

Now that the “age” variable is selected in the variable
list, click on the
button, and a new (ordinally ranked) categorical
variable called “age_(quartiles)” will be generated and shown in the variable
list. At run-time, the cutpoints for
quartiles are computed (to see their values, run the summary statistics option on
the age variable) and the resulting values for the new variable are 1, 2, 3, 4,
representing the age quartile that each record (patients) is assigned to. Select the new variable (“age_(quartiles)”)
in the fashion

and then click on the
button.
Upon completion of the transformation, there will be four new indicator
or “dummy” variables added to the list, appearing as:

The four indicator variables have values of 0 or 1 depending on whether or not a patient’s age falls within the given quartile represented by the variable. If you run summary statistics on the age_(quartiles) variable, the following histogram will be generated:

which clearly shows that the variable takes on values of 1, 2, 3, or 4 at various frequencies of occurrence.
Let us consider the example listed on page 77 of Siegel (Siegel, S., Non-parametric Statistics for the Behavioral Sciences, New York, McGraw-Hill, 1956) involving matched data on social perceptiveness scores from identical twin pairs. The hypothesized experiment for this example tests whether or not there is a difference between social perceptiveness scores between twins after one of the twins attends a single term of nursery school and the other stays home. The data are listed below:
|
Identical twin pair |
Social perceptiveness score of twin in
nursery school |
Social perceptiveness score of twin at
home |
|
a |
82 |
63 |
|
b |
69 |
42 |
|
c |
73 |
74 |
|
d |
43 |
37 |
|
e |
58 |
51 |
|
f |
56 |
43 |
|
g |
76 |
80 |
|
h |
65 |
82 |
Let’s use a “paired T-test” to determine if the average social perceptiveness scores are the same between the twin pairs. Start BioMedStat, and first specify the Open command of the File pull-down menu, then Text, and then specify variable names in first row as shown below:

Next, select the file Siegel_table_5_6.txt, shown as:

and you will see the data in the viewing spreadsheet as follows:

To begin the analysis, select the “T-tests (paired samples)” command of the 2-sample command of the Analyze pull-down menu as follows:

Specify the nursery variable as Variable 1, and home variable as Variable 2 (shown below) and then click on Run.

And the resulting icons and table (after clicking “nursery” icon) will appear as:

The test statistic of 1.2789 for a two-tailed test is compared with a tabled critical value of t0.05; 7=2.365, so the decision rule is to accept the null hypothesis that there is no difference between the average social perceptiveness score among matched pairs of twins. The relevant mathematical formulae are given as:
|
T-Test (Paired Samples) |
||
|
|
nursery |
home |
|
Sample Size (n_1, n_2): |
8 |
8 |
|
Average [avg(x), avg(y)]: |
65.2500 |
59.0000 |
|
Variance [var(x), var(y)]: |
157.6429 |
329.1429 |
|
Numerator: avg(d_i) |
6.2500 |
|
|
s.d.(d_i) |
13.8229 |
|
|
Denominator |
4.8871 |
|
|
Test Statistic* |
1.2789 |
|
|
d.f. |
7 |
|
|
P-value |
0.2456 |
|
|
Mean permutation t |
-0.02 |
|
|
Permutation p-value |
0.2423 |
|
|
Iterations |
9999 |
|
|
*Daniel, W.W.
Paired Comparisons (Sect. 6.4).
In: Biostatistics: A Foundation
for Analysis in the Health Sciences (5th
Edition). |
||
While the SPSS (12) benchmark results are as follows:
Paired
Samples Statistics
|
|
Mean |
N |
Std. Deviation |
Std. Error Mean |
|
|
Pair 1 |
nursery |
65.250 |
8 |
12.5556 |
4.4391 |
| home |
59.000 |
8 |
18.1423 |
6.4143 |
|
Paired
Samples Test
|
|
Paired Differences |
|
|
|
|||||
|
|
Mean |
Std. Deviation |
Std. Error Mean |
95% Confidence Interval of the Difference |
|
|
|
||
|
|
|
|
|
Lower |
Upper |
t |
df |
Sig. (2-tailed) |
|
|
Pair 1 |
nursery - home |
6.2500 |
13.8229 |
4.8871 |
-5.3062 |
17.8062 |
1.279 |
7 |
.242 |
This example will also use an aggressiveness score among 12
sets of identical twins, given on page 283 of Conover, W.J. The Wilcoxin
Signed Rank Test. In: Practical Non-Parametric Statistics (2nd
Edition).

Next, select the file “conover_page_283_wilcoxin.txt, shown as:

and you will see the data in the viewing spreadsheet as follows:
