Dataxiom Logo  
SOLAS



Home ProductsServicesFAQsDownloadsOrder InfoAbout Dataxiom

 

Demo

Sample Screen

 

The only software tool you need for analyzing incomplete data.

SOLAS is the software most research statisticians and data analyst choose when working with incomplete data. Companies such as Amgen, Pfizer, Aventis, Roche, Pharmiacia and many more all use SOLAS to handle their missing data. SOLAS is also extensively used at the FDA. SOLAS complies with FDA and ICH guidelines on Sensitivity Analysis. Many leading pharma and biotech companies have already used SOLAS for imputing their missing values in datasets to be included in FAD submissions.

 

 

What is SOLAS™?
SOLAS™ is developed in close collaboration with Prof. Donald B. Rubin, the leading authority on Multiple Imputation.

SOLAS™ 3.0 for Missing Data Analysis offers principled approaches to missing data now has its own scripting language(optional) and features a choice of 6 imputation techniques, including 2 Multiple Imputation techniques based on the work of Prof. Donald B. Rubin. Data can be imported from a wide variety of file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more. Once the data is imported, the missing data pattern can be displayed and a decision upon the most appropriate technique made. Once imputation is complete the imputed datasets can be analyzed within SOLAS or exported to a variety of other packages in the correct format. It's that simple!

"Solas is currently the only program that implements multiple imputation noniteratively and with substantial flexibility, even including ad-hoc methods, such as LOCF, as points of comparison for sensitivity analysis."
Prof. Donald B. Rubin, Harvard.

The incorrect analysis of datasets with incomplete data can lead to biased analysis and incorrect inference. SOLAS™ 3.0 provides researchers with a range of imputation approaches in an easy to use, validated software package that includes principled, informed solutions to the problems presented by incomplete datasets.

Why should I use SOLAS™ 3.0?

  • Choice of six imputation techniques, including 2 Multiple Imputation Techniques
  • The only software you will need to perform Missing Data Sensitivity Analysis as required by regulatory guidelines
  • The only commercially available and supported software that offers Multiple Imputation
  • Script Language to facilitate easy running of simulations (Optional)
  • Complete control over the Donor Pool selections
  • Can be applied to longitudinal and single observation datasets
  • Easy to use, Windows-based, validated software package

What are the Imputation Techniques available in SOLAS™ ?

SOLAS™ 3.0 provides the user with a choice of 6 imputation techniques, two of which are Multiple Imputation techniques. (See the 'What is Multiple Imputation' section).

Multiple Imputation Techniques:

Predictive Model Based Multiple Imputation:

  • Fully configurable ordinary least squares multiple regression algorithm.
  • Imputed values are based on predictive information contained in covariates.
  • Preserves correlations between variables

Propensity Score Based Multiple Imputation:
(Non Parametric Approach Based on Propensity Scores and the Approximate Bayesian Bootstrap)

  • Fully configurable logistic regression algorithm.
  • Uses information contained in a set of covariates to predict missingness in the variable to be imputed.
  • Avails of additional variables in donor-value selection model, to preserve relationships between variables.

Single Imputation Techniques:

Hot Deck Imputation:

The user specifies matching criteria in the form of variables in the dataset, in order to locate 'donors' in the dataset from whose observed data that imputed value is subsequently drawn. Effectively respondents and non-respondents are sorted into a number of imputation classes according to a user specified set of auxiliary variables. Missing values are then replaced with values taken from matching respondents (i.e. respondents that are similar with respect to the auxiliary variables).

Predicted Mean Imputation (using Regression):

Imputed values are predicted using an ordinary least squares multiple regression algorithm, or a discriminant model if the data are categorical.

Ordinary Least Squares Method - using this method, missing values are imputed using predicted values from the corresponding covariates using the estimated linear regression models.

Discriminant Method - this is a model based method for binary or categorical variables

Last Value Carried Forward (LVCF):
Sometimes called Last Observation Carried Forward (LOCF)

Imputed values are based on previously observed value. This method can only be used for longitudinal variables.

Group Means:

Imputed variables are set to the variable's group mean (or mode in the case of categorical data) derived from a grouping variable.

Post Imputation Analysis

Once the user has imputed for the missing values in their datasheet, SOLAS™ allows the user to run a number of analyses on the imputed datasheets. The analysis options in SOLAS™ 3.0 are as follows:

  • Descriptive Statistics
  • t-Test
  • ANOVA
  • Regression
  • Frequency Tables

Analyzing Multiply Imputed Datasets and Combining of Results:

When the user performs Multiple Imputation as their chosen imputation technique, the result is that they are left with M imputed datasets, where they began with just one. The idea behind this is to ensure valid standard errors, confidence intervals and p-values.

Each of these Multiple Imputed datasets needs to be statistically analyzed by the statistical method of choice. This in effect gives the user M intermediate results, which need to be combined into a final result, from which the conclusions are drawn, according to explicit formulae.

SOLAS™ 3.0 now automatically performs the roll-up of these results to give this final single, consistent parameter estimate.

Other Important Features of SOLAS™ 3.0

Missing Data Pattern:

The Missing Data Pattern in SOLAS™ 3.0 provides a clear overview of the quantity, positioning, and types of missing values in your dataset. By right clicking on any cell in the matrix, you can identify the variable and observation details. This feature allows you to study the missing data patterns and helps you to choose the most appropriate imputation techniques.

Furthermore, you may also now use the Missing Value Pattern to view the monotone and non-monotone missing values in your dataset.

Who uses SOLAS™

This is a partial list of organizations who use SOLAS.

ACADEMIC

  • Athens Univ. of Economics and Business,Boston University ,Brown Univ. ,Case Western Reserve Univ. ,De Montfort University , Erasmus University , Harvard School of Public Health , John Hopkins University, Korea Open University, Kuwait University , Massey University at Albany, MRI Klinikum rechts der Isar der Tech. Univ. Muenchen ,National Taiwan University Hospital ,Niigata University ,NYU ,Okayama University ,Queensland University of Technology ,Scuola Superiore Di Studi Universitairi ,Seoul National University ,Sheffield Hallam Univ. ,The University of Edinburgh ,The University of Hong Kong ,UCLA ,Univ Degli Studi di Roma , Univ. of North Carolina at Chapel Hill ... and many more.

GOVERNMENT AND HOSPITALS

  • Univ. Publica de Navarra ,CDC (Centre for Disease Control and Prevention) ,Central Statistics Service, South Africa ,Childrens Hospital of Philadelphia ,Department of Motor Vehicles ,Department of Veteran Affairs ,FDA ,Mayo Clinic ,Medicines Control Agency (MCA) ,Statistics Canada ,Statistics Denmark ,US Dept. of Labor ... and many more.
COMMERCIAL
  • 3M Company ,A.C Nielson, AAI Deutschland Gmbh & Co. KG, ALMIRALL Prodesfarma S.A., Alza Corporation, Amgen Inc., Analysis Group Economics, Applied Logic Associates Inc. ,Aventis ,B. Braun McGaw ,Baker Norton Pharmaceuticals, BASF Computer Services UK Ltd ,Baxter AG , Beth Israel Medical Group ,Biogen ,Biosense Webster ,Bristol Myers Squibb ,Chenex Inc ,Clinical Trials and Survey Corp. ,ClinTrials Research Inc. ,Dow Chemical Company ,Eli Lilly and Co. ,Emmes Corporation ,Environmental Risk Analysis ,Exxon Research and Engineering ,G.E. Plastics ,Genetics Institute ,Glaxo Wellcome , Kaiser Permanente ,NIH ,Novartis Pharma AG ,Orion-Farmos Pharmaceuticals ,Pfizer GMBH ,Pharmacia ,Pharmacyclics ,Procter and Gamble Co. , Purdue Fredrick ,Quintiles ,RDSI Scirex ,SmithKline Beecham Pharmaceuticals ,Statistics Collaborative Inc. ... and many more.

A case study from Amgen Inc.

Because of the varied nature of Clinical Trial subjects and treatments, standard approaches to catering for missing data such as Last Value Carried Forward and Completers can sometimes be deemed as inappropriate by both biostatisticians and the regulatory authorities alike.

Faced with this fact, AMGEN Inc., the world's largest biotechnology company, explored the application of several missing data imputation techniques for longitudinal clinical trials. Following discussions with the FDA, it was prospectively agreed by both parties, in order to minimize bias, to adopt a Multiple Imputation technique as the primary method for the clinical trial. The trial in question consisted of the following design features:

  • Phase 3, randomized, double-blind, placebo-controlled parallel dose study
  • Longitudinal trial lasting up to 9 months with monthly visits
  • Approximately 350 subjects in each of the two treatment arms
  • Application of the Multiple Imputation technique allowed AMGEN's biostatisticians to utilize standard statistical methods in an Intent-to-Treat paradigm.

SOLAS™ Reviews

"Any data analyst will have experienced problems with incomplete data and the biases they can introduce into our results. Although the major statistical packages provide basic missing values analysis (primarily univariate techniques), there is a gap in the statistical software market incorporating more advanced techniques. Statistical Solutions, the designers of SOLAS™ , believe that they have filled the gap. SOLAS™ is a Windows-based application boasting several methods of missing data imputation, including the simpler methods mentioned above, such as imputing the group mean and using regression methods. However, its highlight is multiple imputation, where several values are imputed for each missing value rather than one, overcoming the major drawback of univariate techniques -underestimating the variance. The package also provides basic statistical analysis, such as ANOVA, regression and non-parametric tests. A 300-page manual accompanies the software, together with example data sets. The manual is clearly written and provides many screen-shots and examples to guide the reader. Despite this, researchers who are not fully up to speed on replacing missing values, particularly using the multiple imputation technique, will need to take some time to learn the principles in order to fully appreciate what SOLAS™ can do. Indeed it may be necessary, and I am sure beneficial, for any new user to attend a workshop. As with any new software, time and thought will have to be given in order to become competent in its use and, as SOLAS™ is currently the only software of its kind, this would be more than worthwhile."
Kate North, Paediatric and Perinatal
Epidemiology 1999, 13, 498

"This package will be valuable to any analyst that has to deal with missing data on a daily basis."
John A Wass, Biotechnology Software
and Internet Journal, July /Aug., '99

"The manual contains a number of positive features and is well organised that both the new and experienced analyst are urged to read most of it. There is a completeness about it that is rarely achieved in software manuals."
John A Wass, Biotechnology Software
and Internet Journal, July /Aug., '99

"In general, SOLAS™ is an easy-to-work with program. The display windows are well organized and the desired option is usually easy to find. The importing and exporting capabilities are quite good and an easy to follow reference guide is provided with the program"
Coen A. Bernaards, Structural
Equations Modelling, Vol. 6 No. 3, 1999

"The core of the program, missing value imputation is a solid addition to the statistical software armamentarium and is simple to utilize."
John A Wass, Biotechnology Software
and Internet Journal, July /Aug., '99

What is Multiple Imputation?

The issue of Missing Data is the subject of increasing debate in contemporary statistics. In any given study, missing data can have many causes. For instance, respondents may be unwilling to answer some questions (item non-response) or refuse to participate in a study (unit non-responses). In addition, transcription errors and dropouts in follow up studies and clinical trials can frequently occur.

The incorrect analysis of datasets with incomplete data can lead to biased analysis and incorrect inferences. SOLAS™ 2proovides e researchers witha range of single and multiple imputation approaches so that the user can apply the most appropriate approach to their problem. When some data are missing, standard variable by variable analysis may be based on divergent sets of cases, and standard multivariate methods are designed only for the analysis of complete cases. The real problem with single imputation is that the single value being imputed, cannot itself reflect the uncertainty about the actual value. Therefore analyses that treat imputed values like observed values will systematically underestimate this uncertainty, leading to standard errors that are too small, p-values that are systematically too significant and confidence intervals which systematically cover less than their nominal coverages.

Enter Multiple Imputation - First proposed by Rubin in the 1970's, the method imputes several values (M) for each missing value, to represent the uncertainty about which values to impute. Analytical incorporation of the uncertainty due to missing data is generally very complicated. Multiple Imputation is a technique to perform this incorporation of the uncertainty about missing data, making use of available software advances in this area.

With Multiple Imputation, the first set of (M) imputed values is used to form the first completed dataset and so on. The M versions of completed datasets are analyzed by standard complete data methods and the results are combined using simple rules to yield single combined estimates, standard errors, p-values, that formally incorporate missing data uncertainty. The pooling of the results of the analyses performed on the multiply imputed datasets, implies that the resulting point estimates are averaged over the M completed sample points, and the resulting standard errors and p-values are adjusted according to the variance of the corresponding M completed sample point estimates. This variance called the 'between imputation variance', provides a measure of the extra inferential uncertainty due to missing data.

Note: Multiple Imputation has been proven in independent research to be able to correct for the systematic inferential failings produced by ignoring missing data and the ad-hoc approaches of single imputation.

With Multiple Imputation, when the statistical model adequately describes the data and the imputations are generated from the predictive distribution of the missing data, given the observed data, the difference between M imputed values for each missing data entry will properly reflect the extra uncertainty due to the missing data.

Major Advantages of Multiple Imputation:

  • Better statistical validity than ad-hoc approaches
  • Multiple Imputation is statistically efficient in that it uses the entire observed dataset in the statistical analysis, efficiency being the degree to which all information about the parameter of interest, available in the dataset, is used.
  • Multiple Imputation saves money, since for the same statistical power, multiple imputation requires a smaller sample size than listwise deletion
  • Once imputations have been generated by a knowledgeable user, researchers can use them for their own statistical analyses.

Program Features

Multiple Imputation

  • Based on techniques developed by Rubin et al.
  • Choice of Model-based or Propensity Score-based approaches.
  • Applicable to longitudinal/repeated measures, and single observation survey type data.
  • Control over the regression model used for imputing each variable.
  • Automatically combines results of requested analyses on Multiple imputed datasheets.
  • Applies principled approaches to dealing with monotone missing data,and non-monotone missing data, avoiding iteration.
Predictive Model -based Multiple Imputation
  • Fully configurable ordinary least squares multiple regression algorithm.
  • Imputed values are based on predictive information contained in covariates.
  • Preserves correlations between variables.
Propensity Score-based Multiple Imputation
  • Fully configurable logistic regression algorithm.
  • Uses information contained in a set of covariates to predict missingness in the variable to be imputed.
  • Avails of additional variables to preserve relationships between variables.
Single Imputation
  • Standard range of traditional imputation techniques, useful for sensitivity analysis.
Hot Decking
  • Imputed values are selected from responders that are similar with respect to a set of auxiliary variables.
Predicted Mean Imputation using Regression
  • Imputed values are predicted using an ordinary least squares multiple regression algorithm.
Last Value Carried Forward
  • Imputed values are based on previously observed value.
Group Means
  • Imputed values are set to the variable's group mean (or mode in the case of categorical data).

Statistical Features

  • Choice of data imputation techniques
  • Descriptive Statistics
  • t-Test
  • ANOVA
  • Frequency Tables
  • Regression
  • Fully interfaced to the complete BMDP Statistical Software Program Library
Graphical Capabilities
  • Missing Data Pattern
  • Customizable plotting facility
  • Plots integrated within all analyses
  • Wide variety of charts and plots including
    • Bar charts
    • Mean comparison charts
    • Box plots
    • Scatterplots
    • Normal probability plots
    • Histograms
Data Management
  • Script language available to facilitate imputation set-up and simulation runs.
  • Spreadsheet-like data entry
  • Easy specification of variable attributes:
    Type, role, grouping, cutpoints, etc.
  • Windows data editing features such as:
    Cut, Copy, Paste, Undo, Select/Unselect variables
  • Easy specification of variable transformation
On-Line Help Features
  • Three Kinds of help
    • Procedural
    • Statistical usage
    • Statistical definition
Data Import/Export
  • SAS (Unix/Windows)
  • SPSS
  • S-Plus
  • SYSTAT
  • Stata
  • BMDP
  • Excel, Lotus 1-2-3, dBase
  • ASCII (optional delimiters)
System Requirements

Pentium processor recommended, 32MB RAM, 14MB Hard Disk Space and Windows 95 or higher

 

Script Language Facility (Optional)


This is a new feature for the SOLAS™ program which allows the saving of imputation set-ups, so that the same set-up can be run on different datasets. When Multiple Imputation is run in SOLAS™ 3.0, the selections made are recorded and the corresponding commands are written to an editing window, which can be modified and saved.

This new facility allows the reproduction of the same results for a particular imputed dataset at a later date, because all choices such as seed values and predictor variables are saved, ensuring exact replication of results. Imputations can be quickly re-run using different predictor variables to see the effect it has on the results. Simulation runs also become easy with this new Script Language facility

In addition, this ability to save and access imputation set-ups can help to simplify the documentation when submitting to a regulatory agency.

The Script Language facility comes with it's own manual that explains the language. This manual comes as standard documentation along with an imputation manual and a systems manual. This is an optional functionality. If you wish to use this option, you can upgrade to activate the option. Contact us for details.

 

 

 

[ Home | Products | Services | FAQs | Downloads | Order Info | About Dataxiom ]

©2002Dataxiom Software, Inc.. All rights reserved. e-Mail comments and questions to support@dataxiom.com.