How to Read ORC File Contents using the Java ORC Tools Jar


Suppose we want to read the data of an ORC file to validate its contents.

The process is quite simple with Java ORC Tools.

1. Install Java

First things first. Let’s check if Java is installed on our machine.

java -version

If Java is not installed, we’ll get an output like this:

'java' is not recognized as an internal or external command,
operable program or batch file.

In this case, we’ll want to go through the steps to download Java.

2. Download the JAR

Let’s go to this repository of orc-tools jar files: https://repo1.maven.org/maven2/org/apache/orc/orc-tools.

Select the latest version available, then download orc-tools-x.x.x-uber.jar.

Alternatively, if we know the version number already (e.g. 1.7.0), we can get the file from the CLI using wget.

wget https://repo1.maven.org/maven2/org/apache/orc/orc-tools/1.7.0/orc-tools-1.7.0-uber.jar

3. Use the JAR to view file contents

Suppose we’ve navigated to a directory with the jar and ORC file.

  • JAR: orc-tools-1.7.0-uber.jar
  • ORC: file.orc

We can view the metadata of this file.

java -jar orc-tools-1.7.0-uber.jar meta file.orc

We can also view the contents of this file.

java -jar orc-tools-1.7.0-uber.jar data file.orc

We can also redirect the output to another file.

java -jar orc-tools-1.7.0-uber.jar data file.orc > file.json