using jarjar to solve hive and pig antlr conflicts
Pig 0.9+ and Hive 0.7+ (and maybe older versions, too) both use antlr. Unfortunately, they use incompatible versions which causes problems if you try to pull in both pig and hive via ivy or maven. Oozie has come up with a deployment workaround for this problem, but HCatalog is still on Pig 0.8 because of the above issue.
Per the recommendation of a co-worker (thanks johng!), I checked out jarjar to shade the pig jar to avoid the conflicts of antlr and all the other dependencies in the pig "fat jar." Here are the steps that I used:
- Download jarjar 1.3 (note that the 1.4 release of jarjar seems to have a bad artifact).
- Generate a list of "rules" to rewrite the non-pig classes in the pig-withouthadoop jar:
-
jar tf pig-0.10.0-cdh4.1.0-withouthadoop.jar | \ grep "\.class" | grep -v org.apache.pig | \ python -c "import sys;[sys.stdout.write('.'.join(arg.split('/')[:-1]) +'\n') for arg in sys.stdin]" | \ sort | uniq | \ awk '{ print "rule " $1 ".* org.apache.pig.jarjar.@0" }' > \ pig-jarjar-automated.rules
- The above command generates one rule per package containing a class file, rewriting the class with a prefix of org.apache.pig.jarjar. The rules are stored in pig-jarjar-automated.rules. See below for the rules file that I generated.
-
- Run jarjar with the rules listed to generate a new jar:
-
java -jar jarjar-1.3.jar process \ pig-jarjar-automated.rules \ pig-0.10.0-cdh4.1.0-withouthadoop.jar \ pig-0.10.0-cdh4.1.0-withouthadoop-jarjar.jar
- Checkout jarjar's command line docs for more info, including the rules file format.
-
- Check the contents of your new jar.
-
jar tf pig-0.10.0-cdh4.1.0-withouthadoop-jarjar.jar | \ egrep ".class$" | grep -c -v "org/apache/pig"
- The above command should return 0 to show that all classes have been rewritten under org/apache/pig.
-
For pig-0.10.0-cdh4.1, the rules file looks like this:
rule com.google.common.annotations.* org.apache.pig.jarjar.@0 rule com.google.common.base.* org.apache.pig.jarjar.@0 rule com.google.common.base.internal.* org.apache.pig.jarjar.@0 rule com.google.common.cache.* org.apache.pig.jarjar.@0 rule com.google.common.collect.* org.apache.pig.jarjar.@0 rule com.google.common.eventbus.* org.apache.pig.jarjar.@0 rule com.google.common.hash.* org.apache.pig.jarjar.@0 rule com.google.common.io.* org.apache.pig.jarjar.@0 rule com.google.common.math.* org.apache.pig.jarjar.@0 rule com.google.common.net.* org.apache.pig.jarjar.@0 rule com.google.common.primitives.* org.apache.pig.jarjar.@0 rule com.google.common.util.concurrent.* org.apache.pig.jarjar.@0 rule dk.brics.automaton.* org.apache.pig.jarjar.@0 rule jline.* org.apache.pig.jarjar.@0 rule org.antlr.runtime.* org.apache.pig.jarjar.@0 rule org.antlr.runtime.debug.* org.apache.pig.jarjar.@0 rule org.antlr.runtime.misc.* org.apache.pig.jarjar.@0 rule org.antlr.runtime.tree.* org.apache.pig.jarjar.@0 rule org.apache.tools.bzip2r.* org.apache.pig.jarjar.@0 rule org.stringtemplate.v4.* org.apache.pig.jarjar.@0 rule org.stringtemplate.v4.compiler.* org.apache.pig.jarjar.@0 rule org.stringtemplate.v4.debug.* org.apache.pig.jarjar.@0 rule org.stringtemplate.v4.gui.* org.apache.pig.jarjar.@0 rule org.stringtemplate.v4.misc.* org.apache.pig.jarjar.@0