As part of the JMSL 2020 release, we’ve added several new algorithms. For a full description of what’s changed in this release please view the Change Log.
- Added new methods for computation of factor score coefficients and factor scores.
- Factor analysis is a main statistical tool in psychometrics, used e.g. in ability tests. One of the goals of factor analysis is to combine highly correlated observable variables into (unobservable) factors in order to reduce the dimensionality of the problem. In this way it helps to minimize the number of variables while simultaneously maximizing the amount of information in the analysis.
ComplexSVD: Computes the Singular Value Decomposition (SVD) of a general complex matrix. SVD can be used in clustering and dimensionality reduction in machine learning and data mining. This can improve performance and reduce time required when applying machine learning to any task with a large number of variables by reducing the number of random variables used in calculations. SVD can also be used in image compression and recovery. By reducing the number of random variables considered, storage space can be reduced, and visualization of data can be improved by focusing on only the most pertinent data from a large set.
NelderMead: Solves unconstrained and box-constrained optimization problems using a direct search polytope method. The Nelder-Mead algorithm is one of the most common methods for optimizing a multi-variable function without using derivative information. Nelder-Mead can be used to improve performance of deep learning models, particularly image recognition and classification problems. Other application areas are in engineering design (e.g. optimal construction of wing platforms), in circuit design and molecular geometry. Since the Nelder-Mead method does not use derivatives, it can be used on datasets and functions that are not smooth or continuous.
EGARCH: Estimates an exponential GARCH model. GARCH models are used to model and predict market volatility (the variability in asset returns). The exponential GARCH (EGARCH) model includes additional parameters to capture asymmetry in the volatility. Asset returns frequently display higher volatility while prices are dropping than while they are rising.
ExtendedGARCH: Abstract class for extensions to GARCH models with methods for estimation, filtering, and forecasting. ExtendedGARCH provides a framework for other extended GARCH models in addition to the exponential GARCH.
ComplexEigen: Computes the eigenvalues and eigenvectors of a general square complex matrix. Complex eigenvalue problems occur in the study of electromagnetic waves. The discretization of the Maxwell equations leads to an eigensystem whose eigenvalues are the feasible values for the wave frequency.
New and Improved Features
Major changes in the new release include performance improvements, enhancements to linear regression, and some architectural upgrades behind the scenes.
The first area with improved performance is time series outlier detection. Outliers in a data set are unusually large or small observations compared to the rest of the data. Outliers in a time series may occur as single points, as a short series of anomalous values, as a temporary change, or as a permanent level shift. The underlying causes for the different types of outliers are various and while they can be instructive, generally they are not predictable. In most cases, outliers should be filtered out of the data before producing model estimates and forecasts.
JMSL ARMAOutlierIdentification implements the procedure described in Chen and Liu (1993) for automatic detection of outliers in time series. The algorithm detects potential outliers and identifies them as one of the types mentioned above. At a certain stage, the algorithm solves a sequence of least squares regressions. Instead of computing each of these problems from scratch, it is possible to solve the initial problem and then update the solution at each step via Givens rotations. This saves a lot of computation time, especially when many outliers were initially detected. After making these changes, the method AutoARIMA.compute, which uses ARMAOutlierIdentification, is now 80% to 99% faster, depending on the size of the problem.
The second area with performance improvements is decision trees. Decision tree algorithms build a model by recursively splitting the data on values of the "best" predictor variable (i.e., the variable that best explains the values of the target variable in that subset of data). The process is repeated in each new subset until stopping criteria is met. By better exploiting these data partitions at each stage, the decision tree algorithms (C45, ALACART, CHAID, and QUEST) are from 10% to 50% faster than before, depending on the size of the data. Class methods that use the decision tree algorithms, such as GradientBoosting.fitModel, are also comparably faster as a result.
In the process of making performance improvements, a few new methods were added and updated including:
- WelchsTTest: a new class for Welch's approximate t-test for two independent normal populations with unequal variances.
- Linear Systems Matrix methods: new methods for matrix-vector, vector-matrix, matrix-matrix and vector-matrix-vector multiplications with general or symmetric matrices.
- LinearRegression.getR returns the upper triangular matrix, R, from the QR-transformed regression problem
- LinearRegression.getRHS returns the column permutation used in the QR decomposition.
- LinearRegression now includes column pivoting to handle rank deficient problems.
- PredictiveModel.isConstantSeries supports extending classes, such as decision trees, can check for a constant response series.
IMSL has been around for almost 50 years, so there are fewer bugs than one might find in less mature libraries; however, together with our customers, we always managed to find and fix a few to help continuously improve the robustness of the IMSL library. Details can be found in the product change logs: 2018.0 and 2018.1.
Another key component of JMSL 2018.0.0 is the improvement of internal JMSL tools and processes to enable more rapid defect patching cycles and Java platform certification going forward. With these updates and additional planned improvements, the development team will provide new product releases on the most widely adopted Java versions first, then respond to requests for platform support from our customers. Additional platforms will be made available as warranted by demand.