Data files uploaded to the Bayesian Network Webserver should be tabdelimited text files with the names of the variables in the first row of the file and the values of the variables for each sample or individual in the remaining rows.
Variable names should not contain any whitespace characters.
BNW automatically determines whether each variable contains continuous or discrete data. BNW applies the following rules, in order, to determine if variables should be considered discrete or continuous:
1. If a variable contains 3 or fewer different values, the variable is considered to be discrete.
2. If a variable contains more than 20 different values, the variable is considered to be continuous.
3. If the ratio of the number of different values for a variable compared to the number of cases in the data set is large, the variable is considered to be continuous. Specifically, if this ratio is 1/3 or larger, the variable is considered to be continuous.
4. If none of the first three rules apply, the data set is inspected to determine if any of the values for the variable contain a period (.). If at least one value contains a period, the variable is considered to be continuous; otherwise, the variable is considered to be discrete.
Users can examine whether or not BNW has correctly loaded input data files and classified variables by clicking on "View uploaded variables and data" on the lefthand menu after uploading a dataset. We believe that BNW should correctly classify variables in most cases, but users may occasionally need to add or remove a period to the data of some variables.
An example input data file for a file with 5 variables is given below. The network contains 2 discrete (Disc1 and Disc2) variables, which are given in the first two columns of the file, and 3 continuous variables (Cont1, Cont2, and Cont3). Disc1 is a discrete variable with two states (1 and 2), while Disc2 has two states (A and B). Although the samples of Cont2 are integral values, we wish to deal with this variable as continuous, not discrete. Therefore, the value of Cont2 for the first sample is given as '3.0' instead of '3' so that one of the values of Cont2 contains a '.', helping to ensure that Cont2 is interpreted as a continuous variable.
Disc1   Disc2   Cont1   Cont2   Cont3 
2   A   3.25   3.0   0.97 
2   B   2.46   2   0.93 
1   A   4.21   33   0.43 
2   B   3.76   8   0.88 
2   A   3.69   4   0.91 
1   B   4.27   13   0.38 
1   A   4.12   9   0.45 
Structure file format
If the structure of the network model for a dataset is already known, users can upload this structure by selecting "Upload structure" on the BNW home page. The structure file should be tabdelimited, with the variable names on the first row. The remainder of the file should be an n x n matrix of 0's and 1's, where n is the number of variables in the network. A '1' in row i and column j in this matrix indicates that there is a directed edge connecting variables i and j, (i.e., there is a edge from i to j in the network). '0' indicate that there is not a directed edge from variable i to variable j.
An example of a structure data file is shown below. The following edges would be included in the network:
1. Disc1 > Cont1
2. Dics2 > Cont2
3. Cont1 > Cont3
4. Cont2 > Cont3
Disc1   Disc2   Cont1   Cont2   Cont3 
0   0   1   0   0 
0   0   0   1   0 
0   0   0   0   1 
0   0   0   0   1 
0   0   0   0   0 
