Oracle提供了专用的数据加载器,可以直接从HADOOP中加载数据到指定对象,该软件包下载地址:Oracle Loader for Hadoop Release 2.1.0
[root@ora11g ~]# unzip oraloader-2.1.0-1.x86_64.zip -d /usr/local/
[root@ora11g ~]# chown -R oracle:oinstall /usr/local/oraloader-2.1.0-1
[root@ora11g ~]# chmod -R 777 /usr/local/oraloader-2.1.0-1
[root@ora11g ~]$ cp /usr/local/oraloader-2.1.0-1/jlib/*.jar /usr/local/hadoop-0.20.2/lib/
SQL> conn scott/tiger
Connected.
SQL> create table t1(RN NUMBER,OBJECT_NAME VARCHAR2(20));
Table created.
[grid@localhost ~]$ more OLH/MyConf.xml
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<!-- Input settings -->
<property>
<name>mapreduce.inputformat.class</name>
<value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>olh</value>
</property>
<!-- Output settings -->
<property>
<name>mapreduce.outputformat.class</name>
<value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>olh_out</value>
</property>
<!-- Table information -->
<property>
<name>oracle.hadoop.loader.loaderMapFile</name>
<value>file:///home/grid/OLH/olh_t1.xml</value>
</property>
<!-- Connection information -->
<property>
<name>oracle.hadoop.loader.connection.url</name>
<value>jdbc:oracle:thin:@//192.168.30.244:1521/jssdb</value>
</property>
<property>
<name>oracle.hadoop.loader.connection.user</name>
<value>scott</value>
<description>AvroSchemaAwareDBInputFormat wants this case sensitive!
(cause mapred.jdbc.username points here)</description>
</property>
<property>
<name>oracle.hadoop.loader.connection.password</name>
<value>tiger</value>
</property>
</configuration>
[grid@localhost ~]$ more OLH/olh_t1.xml
<?xml version="1.0" encoding="UTF-8" ?>
<LOADER_MAP>
<SCHEMA>SCOTT</SCHEMA>
<TABLE>T1</TABLE>
<COLUMN field="F0">RN</COLUMN>
<COLUMN field="F1">OBJECT_NAME</COLUMN>
</LOADER_MAP>
[grid@localhost ~]$ hadoop jar ${OLH_JAR} oracle.hadoop.loader.OraLoader -conf OLH/MyConf.xml
Oracle Loader for Hadoop Release 2.1.0 - Production
Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.
13/05/16 16:19:49 INFO loader.OraLoader: Oracle Loader for Hadoop Release 2.1.0 - Production
Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.
13/05/16 16:19:49 INFO loader.OraLoader: Built-Against: hadoop-1.1.1 hive-0.10.0 avro-1.6.3 jackson-1.8.8
13/05/16 16:19:51 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: J1 is not partitioned
13/05/16 16:19:51 INFO loader.OraLoader: oracle.hadoop.loader.enableSorting disabled, no sorting key provided
13/05/16 16:19:51 INFO output.DBOutputFormat: Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
13/05/16 16:19:51 WARN loader.OraLoader: Sampler error: the number of reduce tasks must be greater than one; the configured value is 1 . Job will continue without sampled information.
13/05/16 16:19:51 INFO loader.OraLoader: Sampling time=0D:0h:0m:0s:20ms (20 ms)
13/05/16 16:19:51 INFO loader.OraLoader: Submitting OraLoader job OraLoader
13/05/16 16:19:53 INFO input.FileInputFormat: Total input paths to process : 1
13/05/16 16:19:56 INFO loader.OraLoader: map 0% reduce 0%
13/05/16 16:20:31 INFO loader.OraLoader: map 100% reduce 0%
13/05/16 16:20:40 INFO loader.OraLoader: map 100% reduce 33%
13/05/16 16:20:46 INFO loader.OraLoader: map 100% reduce 100%
13/05/16 16:20:48 INFO loader.OraLoader: Job complete: OraLoader (null)
13/05/16 16:20:48 INFO loader.OraLoader: Counters: 17
FileSystemCounters
FILE_BYTES_READ=98
FILE_BYTES_WRITTEN=228
HDFS_BYTES_READ=16
HDFS_BYTES_WRITTEN=1850
Job Counters
Data-local map tasks=1
Launched map tasks=1
Launched reduce tasks=1
Map-Reduce Framework
Combine input records=0
Combine output records=0
Map input records=4
Map output bytes=84
Map output records=4
Reduce input groups=1
Reduce input records=4
Reduce output records=4
Reduce shuffle bytes=98
Spilled Records=8
$ hadoop jar \
${ORAHDFS_JAR} oracle.hadoop.loader.OraLoader \
-D mapreduce.inputformat.class=oracle.hadoop.loader.lib.input.DelimitedTextInputFormat \
-D mapred.input.dir=olh \
-D mapreduce.outputformat.class=oracle.hadoop.loader.lib.output.JDBCOutputFormat \
-D mapred.output.dir=olh_out \
-D oracle.hadoop.loader.loaderMapFile=file:///home/grid/OLH/loaderMap.xml \
-D oracle.hadoop.loader.connection.url="jdbc:oracle:thin:@//192.168.30.244:1521/jssdb" \
-D oracle.hadoop.loader.connection.user=SCOTT \
-D oracle.hadoop.loader.connection.password=tiger \
-publish