Compiling and using Parallel routines in DataStage

Compiling and using Parallel routines in DataStage

A parallel routine provides you feature to use external functionality written in C code to use in DataStage.
E.g.  DataStage does not provide regular expression functionality. So we can created shared object of regular expression functionality in C and used it in DataStage.

Before we start writing routines:

-Some compilers require that the source code extension be “C” not “c”. “C” depicts a c++ compile which is required for linking into DataStage.

-Make sure you are using the SAME compiler and options to compile your code that are defined in the administrator in APT_COMPILER/APT_COMPILEOPT and APT_LINKER/APT_LINKOPT, this should be the native compiler and options set by the installer.

Steps to use Object code: (Simplest)

  1. Compile the external C++ code with -c option:

g++ -c myTest.C -o myTest.o

  1. Add a new PX Routine in Designer.
    -Routine Name: This is the name used in the Transformer stage to call your function
    -Select Object Type
    -External subroutine name: This is the actual function name in the C++ code
    -Put the full path of the object in the routine definition
    -Return Type: Match this data type to the actual return type of your C++ function
    -Arguments: create any arguments that are required by your external C++ function
  2. Create a job with a transformer that calls your routine , Compile the job and run.


Create shared object /library of the code.

Position Independent Object:
g++34 -fpic -c sum_pk.c

g++: GNU compiler available in Unix. g++34 is version of g++ available on our server.
-c : compiles code and creates object of file
-fpic: creates object with position independent code which is required for shared object/library

Object file with extension  .o will be created as sum_pk.o

a)      Shared Object:
Shared object is created from position independent object file created above.
g++34 -shared -o  sum_pk.o is the shared object file created from sum_pk.o

b)      Shared library:
Shared library is also created from position independent object file created above.
g++34 -shared -o sum_pk.o is the shared library file created from sum_pk.o

Shared library Vs Shared Object:

Shared Library

Shared object

A shared library file is linked to job at runtime and must be available at runtime. A shared object file is linked to job at compile time.
Shared library name should start with “lib” and should have “.so” as extension
No such constraint on shared object.
Shared library should be present in predefined library paths.
is the library path in our datastage installation
No such constraint on shared object.


Implementing parallel routine in DataStage:

  • File>New>Routines>Parallel Routine
  • Fill all the required values as:

Routine Name:  Any name with just alphanumeric characters only. No underscore as well.
External subroutine name: Name of the C function which we want to invoke
Type: External Function
Object Type: Library if you are using shared library or Object if you are using shared object.
Return Type: Return type of the C function
Library path: Library name with complete path
If shared library the path should be


Thanks, Please leave a comment if you need more assistance on this topic


Author: Kuntamukkala Ravi

ETL Consultant by Profession, Webmaster by Passion

Leave a Reply

Your email address will not be published. Required fields are marked *