BNW has recently been updated to add features and improve the user experience. Four major recent changes have been:
The addition of a new visualization of the network structure: This network structure is presented to users immediately after structure learning to allow for a simpler introduction to the relationships within the network model. This new network image can be shown with or without edge weights that were determined after model averaging and can also be exported as a PNG or SVG file. The visualization of the network that has previously been used in BNW and is appropriate for using the network to make predictions can be accssed by clicking on the Use network to make predictions button on the left side of the page.
The addition of a several options for modifying input files and network structures: Users of BNW now have the option of changing structure learning settings, making specific changes to network structures, and removing variables from an uploaded data set without having to re-upload input data files.
The addition of several options for learning more about the network parameters and distributions: First, a View parameters button has been added that allows users to quantify network parameters. The parameters of the network model considering the original data set that was uploaded by the user or the predicted parameters considering the evidence or intervention that has been entered by the user. The fraction of cases for each possible state is provided for discrete nodes, while continuous nodes are described by the means and standard deviations of Gaussian distributions.
If users have made predictions using either the evidence or intervention modes, a link to a file containing the parameters of the network considering the entered evidence or intervention is also provided. The file lists the predicted fraction of states for discrete nodes and predicted mean and standard deviation of the Gaussian distribution for continuous nodes. Nodes for which evidence or intervention has been entered are also noted in the file.
Second, clicking View violin plots of distributions shows a figure created using plotly that compares fits to uploaded data and the distributions from the network parameters. If evidence or interventions have been added to the network, an additional plot compares the fits to the data, as originally uploaded, with the distributions in the network model considering the evidence or intervention.
BNW is now able to perform cross-validation and make predictions on a test data set: Clicking the Cross validation and predictions button on the network page presents three options for testing the predictions of the network: leave-one-out cross validation, k-fold cross validation, and uploading a test data set to make predictions using the network model.
a) Leave-one-out cross validation: Users should enter the name of a network variable that they want to investigate. These calculations can take up to 3 minutes for large data sets.
After the calculation is complete, the cross-validation output contains the following information:
For discrete variables, the predicted likelihood of each state for the left-out case, given the values of its parent variables in the network, is provided.
For continuous variables, the predicted mean and standard deviation of the left-out variable, given the values of its parent variables in the network, is provided.
b) k-fold cross validation: Users should enter the name of the variable that should be predicted as well as the number of folds into which the data set should be split.
The output file contains similar information as what is produced using leave-one-out cross-validation.
c)Predictions on a test data set: Users can then upload a file containing a test data set. The format of the file should follow the format of the input data file with one important exception: The first line of the file should containing the name of the variable that should be predicted.
The test data file can contain missing data. If data for a variable of a given sample or case is not known, an "NA" can be entered in the input file. Data can be missing for the variable that is to be predicted or for the variables that are to be used as predictors.
Other changes and new features that have been added to BNW since its introduction in Bioinformatics include:
1) Parameter learning settings have been modified. Specifically, network parameters (e.g., the distributions of states for discrete variables and means and standard deviations of Gaussian distributions) are now learned using Dirichlet prior distributions. In our testing, this has had a minimal impact on the parameters of most networks, but has helped reduce the impact of cases with rare combinations of states on the predicted distributions of some networks. In the original version of BNW, no priors were used.
2) Increased flexibility in formatting of uploaded data files. One major change is that BNW now allows for users to upload data files in which discrete variables have alphabetic values. For example, a file containing a node for the genotype of BXD mice can now have values of B and D; the values do not have to be recoded as integers. Input data files can also contain missing data by entering "NA" for missing values. Cases with missing data are ignored during structure and parameter learning of the network. A full description of the proper format for input files in BNW considering these changes can be found here.
3) Added "View uploaded variables and data" button to allow users to ensure that uploaded data sets are loaded and parsed correctly. After users upload a data set, clicking this button provides the ability to view either the uploaded data file directly or view a variable description file that shows how BNW has parsed the data. The variable description file lists the number of variables and cases (e.g., individuals or samples) in the data file. It also indicates whether each variable is discrete or continuous and the criteria that was used to make this determination. For discrete variables, the possible states of the variable are provided. For continuous variables, the mean and standard deviation of the variable is provided.
4) Added a "Use a network ID to return to a network" on the BNW home page. This button allows users to enter a network ID to return to a previously generated network model. The network ID can also be used to share the network model with collaborators. The network ID is a three character string that is listed in left menu of a BNW network page.
5) In a change that is invisible to users, Octave is used instead of Matlab for parameter learning using the BayesNet Toolbox.
|