Statistics Dialog
Applicability: Cranium, Synapse (core versions 0315+)

Synapse and Cranium use various All Data dialogs to record data values having non-active status values. Often, the designation of a datum as passive or rejected is based on comparing its value with those of other data. The Statistics Dialog provides you with several summary statistics that often help you decide upon the status of a new datum.

The Statistics Dialog is activated by using the Statics button found on All Data dialogs such as the Datum-Reference All Data Dialog and the Datum-Units-Reference All Data Dialog.

Dialog Controls

Pressing the All Data dialog's statistics button activates the Statistic dialog. The Statistics dialog collects the active and passive data values, calculates summary statistics, and then displays these values.

1
Data Type Control: displays a general description of the type of data being analyzed.
2
Count Control: displays the number of data values being analyzed. The dialog automatically collects all active and passive data values.
3
Minimum Control: displays the minimum value of the compiled data.
4
Average Control: displays the unweighted average of the compiled data.
5
Maximum Control: displays the maximum value of the compiled data.
6
Std Dev Control: displays the standard deviation of the compile data.
7
Range Control: displays the range, i.e., the maximum value minus the minimum value, of the compiled data.
8
Level Control: displays the confidence level used to calculate the lower and upper confidence limits.
9
Lower Control: displays the lower confidence limit.
10
Upper Control: displays the upper confidence limit.
11
Outliers Control: displays the indices of those data whose value is below the lower confidence limit or above the upper confidence limit.
Statistics

The Statistics Dialog uses a t-distribution to calculate the confidence limits around the mean of the compiled data. These limits are calculated according to the following equation:

limit = avg ± factor * stdDev / sqrt(nobs)

In the previous equation, limit is either the lower or upper confidence limit, avg is the average of the compiled values, factor is the t-distribution value, stdDev is the data's standard deviation, and nobs is the number of data values.

Example: Analyzing Data Values
  1. Open the MKS Sample Knowledge Base document. (Open a "working" document or create a copy of a document (see here) if you are just experimenting with this functionality.)
  2. Change to the Chemicals chapter and navigate to the page showing n-Heptane. (See documentation on Navigation Overview for details on navigating chapters and pages.)
  3. Scroll down to the Optical Properties Section. Note that the liquid refractive index field's data control is displaying a green triangle in its uppper right corner. This indicates that there are additional, non-active data.
  4. Click the right mouse button on the data control. The application will display the data commands menu. Select the Edit All Data command from the menu.
  5. The application activates the Datum-Reference All Data Dialog.
  6. Select a blank row in the table control and press the dialog's Edit button. The application activates the Datum-Reference Edit dialog.
  7. Enter 1.36 for the Datum value and press the dialog's OK button. The new value is added to those in the Datum-Reference All Data Dialog.
  8. Note that the newly entered datum has a status of 'Unknown'. Select the new datum's row and press the dialog's Set Status button. The status menu is displayed. Select 'Passive' from the menu.
  9. Now press the dialog's Statistics button. The application activates the Statistics dialog which displays an initial set of summary statistics.

    Note that the dialog uses the mean value and the confidence limits level to calculate an lower and upper confidence limit. These limits are outside the minimum and maximum values of the compiled data. Thus, no data values are marked as outliers.

  10. Change the Confidence Limits Level to 95.0%. The dialog will calculate new lower and upper confidence limits.

    Using these new limits, the dialog determined that datum number 5, the new value we just entered, is smaller than the lower limit. It thus marked this value as an outlier.

  11. Press the Done button to close the Statistics dialog. You can now use the All Data dialog's Set Status button to mark the outlier datum as 'Rejected'. Once this done, you can repeat the process to determine if any other data values should be rejected.
Tip: Think thoroughly about rejecting data

Although the dialog used a statistical analysis to identify an outlier, you should additional factors, such as the difficulty of measuring the current property for the current chemical, before you decide to reject a data value.

It is also recommended that you reject one datum at a time. Once you reject a datum, the data set's count, average, and standard deviation will change which will change the confidence limits.

Related Documentation
Topic Description
Getting Started using Synapse provides a quick tour of Synapse's capabilities including examples of chemical product design.
Getting Started using Cranium provides a quick tour of Cranium's capabilities including a discussion of structure editing.
Estimating Chemical Properties a short video demonstrating how to estimate the physical properties of chemicals using either Synapse or Cranium.
Estimating Mixture Properties a short video demonstrating how to estimate the physical properties of mixtures using either Synapse or Cranium.