Offshore Outsourcing Informatica ETL programming, ETL development and PowerCenter scripts, MS SQL 2000, Oracle database and data warehousing

Informatica ETL outsourcing

ETL stands for extract, transform and load, the processes that enable companies to move data from multiple sources, reformat and cleanse it, and load it into another database, a data mart or a data warehouse for analysis, or on another operational system to support a business process.

Companies know they have valuable data lying around throughout their networks that needs to be moved from one place to another—such as from one business application to another or to a data warehouse for analysis.

The only problem is that the data lies in all sorts of heterogeneous systems, and therefore in all sorts of formats. For instance, a CRM system may define a customer in one way, while a back-end accounting system may define the same customer differently.

To solve the problem, companies use extract, transform and load (ETL) software, which includes reading data from its source, cleaning it up and formatting it uniformly, and then writing it to the target repository to be exploited.

The data used in ETL processes can come from any source: a mainframe application, an ERP application, a CRM tool, a flat file, an Excel spreadsheet—even a message queue

Pulling the Data

Extraction can be done via Java Database Connectivity, Microsoft Corp.'s Open Database Connectivity technology, proprietary code or by creating flat files.

After extraction, the data is transformed, or modified, depending on the specific business logic involved so that it can be sent to the target repository.

There are a variety of ways to perform the transformation, and the work involved varies. The data may require reformatting only, but most ETL operations also involve cleansing the data to remove duplicates and enforce consistency. Part of what the software does is examine individual data fields and apply rules to consistently convert the contents to the form required by the target repository or application, says Schiff.

For example, the category "male" might be represented in three different systems as M, male and 0/1. The ETL software would recognize that these entries mean the same thing and convert them to the target format.

In addition, the ETL process could involve standardizing name and address fields, verifying telephone numbers or expanding records with additional fields containing demographic information or data from other systems.

For instance, that a customer runs Oracle financials, PeopleSoft human resources software and SAP manufacturing applications and needs to access the data in each of these systems to complete an order-to-cash process. This will require the company's ETL software to extract data from the originating systems, which isn't as easy as it sounds in some instances—for example, pulling data from the SAP manufacturing application would require the generation of SAP proprietary ABAP code to extract the shipping and open purchase-order information.

The transformation occurs when the data from each source is mapped, cleansed and reconciled so it all can be tied together, with receivables tied to invoices and so on.

After reconciliation, the data is transported and loaded into the data warehouse for analysis of things such as cycle times and total outstanding receivables.  

One Truth

Motorola to collect information from 30 different procurement systems and send it to its global supply chain management data warehouse to analyze what the company was spending in aggregate, says Phillips.

In the past, companies that were doing data warehousing projects often used homegrown code to support ETL processes, says Schiff. However, even those that had done successful implementations found that the source data file formats and the validation rules applying to the data evolved, requiring the ETL code to be modified and maintained. And companies encountered problems as they added systems and the amount of data in them grew. Lack of scalability has been a serious issue with homegrown ETL software.

Providers of packaged ETL systems include Microsoft, which offers data transformation services bundled with its SQL Server database. Oracle has embedded some ETL capabilities in its database, and IBM offers a DB2 Information Integrator component for its warehouse offerings.

Data Warehouse Solutions

· e-Business Intelligence
· Re-Engineering of DW applications
· CRM Data Warehouse
· Data Warehousing Implementation
· Data delivery applications
· Data integration

Case Studies
· Business Management System
· Data Warehouse WAP System
· Cell-phone Users Survey
· BI Platform Development
· Data Warehouse Reporting Portal

Data warehouse Technologies

OLAP tools:MicroStrategy, Brio, Business Objects, Cognos, Hyperion, Informix Metacube

ETL tools: Informatica, Datastage, Datajunction, DataMirror

Databases: Oracle, MS SQL Server 2000, Sybase, DB2, Informix, Redbrick, Teradata;

Datamining Tools: SAS Miner, Intelligent Miner;

End to end tools: SAP Business Warehouse suite and Oracle Data Warehousing product suite


» Offshore Application Development Framework

Real Time Database development outsourcing