PyIMSL™


Transforming Analytic Application Development

Introducing PyIMSL™ Studio

Table of Contents

Abstract

In a recent Visual Numerics customer survey, more than 60% of respondents said that they create prototype models before developing code for production applications. The vast majority of respondents also indicated that their organization uses multiple numerical analysis tools for prototyping. Using different tools in the prototyping and production stages of development can cause significant challenges. This paper is intended to provide modelers, production developers and managers of analytic applications with an overview of common challenges when moving prototype models into production applications.

This paper will also introduce PyIMSL Studio, the first and only commercially-available numerical analysis application development environment designed for deploying mathematics and statistics prototype models into production applications.

Analytic Application Development

Like the respondents in the survey mentioned above, most organizations today developing analytic applications follow a process that includes creating a prototype model before developing the production application. The modelers (domain experts) who build prototypes and the implementation teams (developers) who are responsible for taking the prototypical models into product have many choices of numerical analysis tools.

For prototyping, popular options for numeric analysis in analytic models include commercial tools such as the ubiquitous Microsoft Excel as well as many free and open source alternatives including FreeMat, GSL, Octave and R. Two common characteristics in these tools are ease-of-use and breadth of numerical functions. Users of these tools typically want to quickly prove a concept and although they are domain experts in a specific field, they may or may not be trained software programmers. Using a tool that provides easy access to mathematical and statistical functions without requiring extensive coding is useful to help these experts rapidly create numerical models.

Many prototype models become a part of production applications that are written in development languages like C, C++, C#/.NET, Java or Fortran. To transform a model into a component of a production application, the modeler usually must rewrite the prototype in a development language or hand off the prototype to an implementation team to do the work. In most cases, the re-write includes using a separate native library for the analytics in the production application. Several testing steps must happen to ensure that the numeric results from the prototype match the numeric results in the production application.

Prototype to Production Gap

With the use of different tools, numerical analysis functions and development languages, there is typically a gap between creating a prototype and deployment of that prototype into a production application.

Bridging the gap through re-design and re-write of prototype models can be time-consuming, costly, and potentially risky as the numerical algorithms used for analysis in the prototype stage will differ from the algorithms used in the commercial development stage. Production teams spend time finding algorithms equivalent to those used in prototype models and often resort to developing algorithms internally. Even when there are comparable algorithms, different tools can produce different results, so teams spend time and resources determining the root cause of the varying results.

Introducing PyIMSL Studio

PyIMSL Studio is designed to reduce the prototype to production gap. Combining proven prototyping tools with the Visual Numerics IMSL® C Numerical Library, it is the only commercially-available analytic application development environment designed for deploying mathematics and statistics prototype models into production applications.

At the heart of PyIMSL Studio is the IMSL C Library, a comprehensive set of mathematical and statistical algorithms that programmers can embed into their software applications. Functions in the IMSL C Library include:

Mathematics Statistics
  • Matrix Operations
  • Linear Algebra
  • Eigensystems
  • Interpolation & Approximation
  • Numerical Quadrature
  • Differential Equations
  • Nonlinear Equations
  • Optimization
  • Special Functions
  • Finance & Bond Calculations
  • Basic Statistics
  • Time Series & Forecasting
  • Nonparametric Tests
  • Correlation & Covariance
  • Data Mining Regression
  • Analysis of Variance
  • Transforms
  • Goodness of Fit
  • Distribution Functions
  • Random Number Generation
  • Neural Networks

The items listed in the table represent entire areas of functionality with numerous algorithms within each area. Within the IMSL Libraries, the actual count of available mathematics and statistical algorithms runs into the thousands, giving developers many options to mix and match algorithms as needed to create unique and competitively engineered analytical applications.

Within PyIMSL Studio, these mathematics and statistical functions are available to Python programmers for quick prototyping and to C developers for production application development. Most important is that it is same underlying algorithms available in each language. Current Visual Numerics customers estimate that by using the same algorithms in prototyping as in production, development time can be reduced significantly through the elimination of extra research, re-writing and testing.

The remainder of this paper will describe the prototyping and production tools available in PyIMSL Studio in more detail.

Prototyping Analytic Applications

The typical creator of analytic models is a domain expert with special knowledge in a particular area and the skills to apply mathematics or statistics to solve problems in that area.

Examples of domain experts in the area of analytics include:

  • Quantitative analysts in the finance area who use mathematical or statistical models to solve problems in areas such as risk management and derivatives pricing
  • Six Sigma Black Belts who use statistical methods to identify and remove the causes of defects and errors in manufacturing and business processes
  • Data miners who use statistical techniques to uncover hidden patterns in data that help governments identify terrorists, businesses identify likely customers, and scientists identify an individual’s risk of developing cancer or other common diseases

Domain experts may or may not have extensive programming skills so writing an analytic prototype directly in C with IMSL Library functions may not be a viable option. Even with superior programming skills, many developers would not choose the C language as a rapid prototyping language.

To make the IMSL Library functions easily accessible to modelers, Visual Numerics selected an increasingly popular language to include with PyIMSL Studio, Python1. Python is a general-purpose high-level programming language. The Python website describes it as a language that "can be learned in a few days". In the past five years, Python has risen to be regularly in the top 10 programming languages on the TIOBE Programming Community Index which gives an indication of popularity based on the number of skilled engineers world-wide, courses and third party vendors. As Python has grown in popularity, so has the number of available Python-related development tools.

For PyIMSL Studio, Visual Numerics has integrated and packaged a practical and robust set of Python tools to use for analytic modeling. These tools are tested, documented, and supported in a single installable package by Visual Numerics. This toolset includes:

  • Python, including ctypes
  • NumPy—A set of modules for powerful and efficient data array manipulation. The de-facto standard for array and matrix algebra in Python.
  • Data I/O and transformation components—utilities for data filtering and transformation, including:
    • An ASCII data file reading utility available in Python and in production C code.
    • A missing value identification and substitution utility available in Python and in production C code.
    • PyODBC—A Python module for database access on Windows and Linux.
    • xlrd—A Python module for reading data from Microsoft Excel files.
  • matplotlib/pylab—Python analytical charting components.
  • IPython—A command line interface for interactive development and exploration in Python.
  • Eclipse/Pydev—A full featured Integrated Development Environment (IDE) for Python.

The PyIMSL Studio installation ensures that only compatible components are loaded into the development environment. See Figure 1 – PyIMSL Studio Installation.


Figure 1 - PyIMSL Studio Installation

PyIMSL Wrappers

When doing analytic modeling, the most important component for modelers is to have the necessary mathematical and statistical functionality required for analysis. Within PyIMSL Studio, this functionality is provided by the PyIMSL wrappers, a collection of Python interfaces to the IMSL C Library algorithms.

The PyIMSL wrappers expose all of the functionality of the IMSL C Libraries in a way that is true to the Python language philosophy. Functions requiring arrays can be called with anything that behaves like an array in Python. Error handling uses standard Python exception handling. Using PyIMSL delivers minimalist and readable code. “Pythonic” is the term used to describe this in the Python community. For the modeler, the PyIMSL wrappers provide a way to access the comprehensive, reliable and highly effective IMSL C Library functions without having to do any programming in C.

The PyIMSL wrappers documentation describes how to use all of the mathematics and statistics function in the library and is available for download from the Visual Numerics website.

Because the PyIMSL wrappers deliver a direct interface to each IMSL C Library function, those who will eventually translate the Python prototype code to C for deployment will have no trouble matching the algorithm routines and parameters in C for the production application.

Prototype to Production

Developers tasked with deploying an analytic model into a production application have many important considerations, including:

  • Ease of deployment
  • Consistency and accuracy of results
  • Performance and scalability
  • Platform deployment considerations

In each of these areas, PyIMSL Studio delivers a useful solution.

Ease of deployment

The PyIMSL Studio environment is designed to make it as simple as possible to move from the Python prototyping environment to C. This ease of deployment is accomplished in two ways:

  • The first is in the software itself. There are similar conventions in the IMSL Library algorithms in both Python and C.
  • The second is through helpful tips and techniques provided in the PyIMSL Studio User Guide.

On the software side, the naming and calling conventions are similar for both PyIMSL and the IMSL C Library, making it easy to match variables in production applications to those in the prototype. The direct correlation between Python and C functionality also simplifies the task of comparing Python and C code results during and after the conversion process.

Given that Python and C are fundamentally different languages, documentation is provided to cover how to create programs that are easier to port from Python to C, describe the actual conversion process, and how to deal with common conversion issues.

Consistency and accuracy of results

As described earlier in this paper, delivering consistent results between analytic prototypes and production applications is challenging, if not impossible, when using different algorithms between these two stages of development.

By providing the same numerical algorithms to both modelers and implementation teams, PyIMSL Studio ensures consistent numerical results between prototype models and production applications.

Because the numerical algorithms being provided in PyIMSL Studio are the IMSL C Library, PyIMSL Studio developers are also ensured of getting very accurate results. The libraries have been well-seasoned by years of use by thousands of customers. Since 1970, the IMSL Libraries have been the cornerstone of numerical analysis, predictive analytics and high-performance computing applications in science, technical and business environments.

Performance and scalability

Organizations move prototypes into production to obtain better application performance, perhaps to get results faster, crunch larger data sets or to make an application available to a large number of users simultaneously.

Scripting languages like Python are often perceived to have performance limitations so are usually not considered to be a viable option for production application development.

For some applications, performance of Python might be adequate. If this is the case, the prototype developed using PyIMSL Studio can simply be deployed into production.

For most applications though, the prototype will need to be re-written in a development language. For these cases, re-writing in C and leveraging the IMSL C Library algorithms will deliver a high-performance solution.

Platform Considerations

Developers taking PyIMSL prototype models to production in C or C++ using the IMSL C Library have an extensive list of computing platforms to choose from for production applications. The library support many different hardware platforms, operating systems and compilers, protecting users’ investment in existing systems and ensuring the ability to migrate applications to new environments when needed.

Summary

Developers of analytic models and production applications have many alternatives to choose from for numerical analysis. However, when modelers and developers use different tools, a gap is created between the prototype modeling and development stages of application development. In this gap, implementation teams must re-write, research and test applications to transition code and ensure consistent results.

By providing modelers and implementation teams with a common set of tested and supported high-quality development tools as well as the same underlying numerical algorithms, PyIMSL Studio removes the prototype to production gap. With PyIMSL Studio, prototype models become part of production applications quicker and with less re-work, risk and complexity.

For more information or to request an evaluation copy, visit the PyIMSL Studio area of the Visual Numerics website.

 


Need more information?

Contact us at 800.222.4675
or info@vni.com
Global Directory »

View Video

PyIMSL Studio in use

Company Products & Services Solutions Success Stories Support Downloads Email this page
© Copyright 2010 Rogue Wave Software, Inc. All Rights Reserved Legal Privacy Sitemap