My preferred data management platform rests upon the PostgreSQL database. The most recent releases have been truly impressive, especially when you consider that the Postgres database is free software.
Using Postgres for data management means integrating disparate data sources such as other databases, spreadsheets, flat files, etc into Postgres. There are two primary mechanisms for this: data loading and data virtualization.
I’ll cover Postgres data loading in this post, including options for extraction, transformation, and loading (ETL).
Import via Talend Open Studio
The tool I’ll use for this data import is the free Talend Open Studio. I’ll be pulling data from an Oracle Express instance with it’s Human Resources (HR) sample database.
First trick was to launch Talend without error due to the presence of a Java 7 install on my system. Once I figured out the cause of startup failure, I was able to create a launch.bat file to make explicit use of a Java 8 install.
start /B TOS_DI-win-x86_64.exe -VM "C:\Program Files\Java\jdk1.8.0_40\bin"
Talend OS starts up pretty empty, particularly the ‘Palette’ area on the right. Once selecting ‘Help | Install Additional Packages…’ and installing all required and optional packages, I was able to create a new Job by right-clicking the ‘Job Designs’ folder on left in my local project.
Once I had a populated palette, things got became quite intuitive for a simple data extraction from Oracle into Postgres. From the palette, I dragged a ‘tOracleInput’ component to the project area, then a ‘tPostgresqlOutput’ component over and connected them together as shown:
Setting the parameters for Oracle input component was straight forward:
As was setting the parameters of the Postgres output component:
However, by default Talend will use column names from Oracle exactly, including them as upper-case. Uppercase columns in Postgres will work if you quote them everywhere, but who wants to bother with that? So instead, I changed the output schema mapping as shown:
Running the job successfully resulted in a new ‘employee’ table in Postgres, with records seen here:
Overall, once up and running, Talend’s tools for data integration seem quite powerful. In future, I’ll be looking at other palette components available.
More in this series…
- SQL Server into PostgreSQL with SquirrelSQL
- MySQL into PostgreSQL with Python’s petl
- Excel into PostgreSQL with RapidMiner
- Data Virtualization with PostgreSQL