Compiling and using Parallel routines in DataStage

Compiling and using Parallel routines in DataStage

A parallel routine provides you feature to use external functionality written in C code to use in DataStage.
E.g.  DataStage does not provide regular expression functionality. So we can created shared object of regular expression functionality in C and used it in DataStage.

Before we start writing routines:

-Some compilers require that the source code extension be “C” not “c”. “C” depicts a c++ compile which is required for linking into DataStage.

-Make sure you are using the SAME compiler and options to compile your code that are defined in the administrator in APT_COMPILER/APT_COMPILEOPT and APT_LINKER/APT_LINKOPT, this should be the native compiler and options set by the installer.

Steps to use Object code: (Simplest)

  1. Compile the external C++ code with -c option:

g++ -c myTest.C -o myTest.o

  1. Add a new PX Routine in Designer.
    -Routine Name: This is the name used in the Transformer stage to call your function
    -Select Object Type
    -External subroutine name: This is the actual function name in the C++ code
    -Put the full path of the object in the routine definition
    -Return Type: Match this data type to the actual return type of your C++ function
    -Arguments: create any arguments that are required by your external C++ function
  2. Create a job with a transformer that calls your routine , Compile the job and run.

 

Create shared object /library of the code.

Position Independent Object:
g++34 -fpic -c sum_pk.c

g++: GNU compiler available in Unix. g++34 is version of g++ available on our server.
-c : compiles code and creates object of file
-fpic: creates object with position independent code which is required for shared object/library

Object file with extension  .o will be created as sum_pk.o

a)      Shared Object:
Shared object is created from position independent object file created above.
g++34 -shared -o sum_pk.so  sum_pk.o

sum_pk.so is the shared object file created from sum_pk.o

b)      Shared library:
Shared library is also created from position independent object file created above.
g++34 -shared -o libsum_pk.so sum_pk.o

libsum_pk.so is the shared library file created from sum_pk.o

Shared library Vs Shared Object:

Shared Library

Shared object

A shared library file is linked to job at runtime and must be available at runtime. A shared object file is linked to job at compile time.
Shared library name should start with “lib” and should have “.so” as extension
E.g. libsum_pk.so
No such constraint on shared object.
Shared library should be present in predefined library paths.
E.g.
/opt/IBM/InformationServer/ASBNode/lib/cpp/
is the library path in our datastage installation
No such constraint on shared object.

 

Implementing parallel routine in DataStage:

  • File>New>Routines>Parallel Routine
  • Fill all the required values as:

Routine Name:  Any name with just alphanumeric characters only. No underscore as well.
External subroutine name: Name of the C function which we want to invoke
Type: External Function
Object Type: Library if you are using shared library or Object if you are using shared object.
Return Type: Return type of the C function
Library path: Library name with complete path
If shared library the path should be
                            /opt/IBM/InformationServer/ASBNode/lib/cpp/

 

Thanks, Please leave a comment if you need more assistance on this topic

 

Author: Kuntamukkala Ravi

ETL Consultant by Profession, Webmaster by Passion

Leave a Reply

Your email address will not be published. Required fields are marked *