dealing with dependency chain issues in maven

Maven made it easy to manage your dependencies. long gone are the days of /lib folders and chasing after various obscure *.jar file you never directly used. these days you just declare the set of libraries you know you need and most of the time maven handles the rest through transitive dependencies and dependency version resolution.

most of the time.

sometimes though, things can get very ugly and very hard to diagnose.

there are a few common problems that can ruin your day:

  1. directly using a transitive dependency: if my code depends on library A, and library A in turn depends on library B then B is on my compile and runtime paths. this means that i can directly use classes from B in my code without ever declaring a direct dependency on it. this becomes a problem when the maintainers of A later decide to drop B, at which point your code stops compiling after you upgrade A.
  2. dependency bloat: a lot of times during development you’ll bring in various libraries (because its so easy) only to discard them later on. multiply this by the number of modules and developers working alongside you and you end up with a lot of no-longer-used dependencies that everyone’s afraid to remove because theyre not sure who’s using them.
  3. version uncertainty: suppose your code depends on libraries A and B, and both of thede libraries depend on library C, but on different versions of it. it is not always clear which version of C will end up on your runtime classpath and it might depend on things like the exact version of maven used to compile the project.
  4. duplicate class declarations on the classpath: its possible that among the dozens of libraries your project may depend on, two will define the exact same class/file and which copy gets used is at the hands of the classloader. in javaEE environments all copies might get used. and if youre especially unlucky the copies wont even be identical code. dont believe me? have a look here.
  5. cross-build injection attacks. this is for the more paranoid among us, but i’ve included it here for completeness’ sake. it basically comes down to how much do you trust code brought in by maven from outside – it might be intercepted and/or replaced between builds. a solution to this problem is presented by gary rowe on his blog.

now that we’ve covered what could go wrong, lets see how we can defend against it.

1st of all, i’d like to cover an invaluable tool in tracking down these issues – mvn dependency:tree. this maven invocation prints out your entire dependency tree, everything included, and is very useful in tracking down the source of any dependency issues you may have.

here’s how i ended up protecting my build:

<build>
   <plugins>
      <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-enforcer-plugin</artifactId>
         <version>1.3.1</version>
         <executions>
            <execution>
               <id>enforce dependency convergence</id>
               <goals>
                  <goal>enforce</goal>
               </goals>
               <configuration>
                  <rules>
                     <DependencyConvergence/>
                  </rules>
               </configuration>
            </execution>
         </executions>
      </plugin>

      <plugin>
         <groupId>com.ning.maven.plugins</groupId>
         <artifactId>maven-duplicate-finder-plugin</artifactId>
         <version>1.0.6</version>
         <executions>
            <execution>
               <phase>verify</phase>
               <goals>
                  <goal>check</goal>
               </goals>
               <configuration>
                  <failBuildInCaseOfConflict>true</failBuildInCaseOfConflict>
                  <ignoredResources>
                     <!-- you will probably need something here -->
                     <ignoredResource>changelog.txt</ignoredResource>
                  </ignoredResources>
               </configuration>
            </execution>
         </executions>
      </plugin>
   </plugins>
</build>

<profiles>
   <profile>
      <id>StrictDependencies</id>
      <build>
         <plugins>
            <plugin>
               <groupId>org.apache.maven.plugins</groupId>
               <artifactId>maven-dependency-plugin</artifactId>
               <version>2.8</version>
               <executions>
                  <execution>
                     <id>check for unused/undeclared dependencies</id>
                     <goals>
                        <goal>analyze-only</goal>
                     </goals>
                     <configuration>
                        <failOnWarning>true</failOnWarning>
                     </configuration>
                  </execution>
               </executions>
            </plugin>
         </plugins>
      </build>
   </profile>
</profiles>

issues #1 and #2 are addressed by the maven-dependency-plugin invocation. the invocation itself is tucked away under a profile because there may be scenarios where it’ll give false positives – for example if youre running a build without tests (-Dmaven.test.skip=true) it’ll complain about a bunch of unused test libraries. so when you know youre doing a full build, you can invoke this check by adding -PStrictDependencies to your maven command line.

issue #3 is addressed by the maven enforcer plugin which has a dependency convergence rule.

issue #4 is addressed by the maven duplicate finder plugin. you will probably need to add a few ignored files to that invocation as a lot of libraries pack stuff like changelogs.

creating a “central” maven repository on github for multiple projects

hosting maven repositories on github for is a common problem. the best solution I came across was posted by Michael Burton on StackOverflow, and is an excellent solution for most cases. unfortunately for me i was looking for something like a “central” maven repository for multiple (possibly unrelated) projects of mine, and not a repository-per-project, as his solution creates.

so, based his approach, I managed to arrive at a solution that allows me to have a single parent maven project that all my projects will use which makes “mvn clean deploy” just work.

the general outline of my approach is this:

  1. create a github project for the maven repository (mine is at https://github.com/radai-rosenblatt/maven-repository).
  2. create a github repository for the parent pom, which is its own project. (mine is at https://github.com/radai-rosenblatt/parent)
  3. the deployment solution is slightly more complex than michael’s solution since we dont want to overwrite the entire repository – we want to add our build’s artifact to all other artifacts already in the repository. so this is how I did it:
    1. use the maven-scm-plugin to checkout the repository project to a location under /target
    2. use the maven-deploy-plugin to deploy our project into that location
    3. use github’s site maven plugin to merge the updated repository back to github

3.2 and 3.3 above are taken from michale’s original solution. the main different is in step 3.1 where we checkout the current state of the maven repository so that we add to it and merge it back instead of simply deploying to a temp location and overwriting the repository with that location (which would leave only our newly-built project in the repository).

this is how it looks like in the pom.xml (the complete pom is here)

<properties>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<scm.provider.jgit.version>1.9-SNAPSHOT</scm.provider.jgit.version>
	<local.maven.repository.path>${project.build.directory}/maven-repository</local.maven.repository.path>
	<remote.maven.repository.owner>radai-rosenblatt</remote.maven.repository.owner>
	<remote.maven.repository.project>maven-repository</remote.maven.repository.project>
</properties>
<build>
	<plugins>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-scm-plugin</artifactId>
			<executions>
				<execution>
					<id>check out current state of repository</id>
					<phase>deploy</phase>
					<goals>
						<goal>checkout</goal>
					</goals>
					<configuration>
						<checkoutDirectory>${local.maven.repository.path}</checkoutDirectory>
						<connectionType>developerConnection</connectionType>
						<developerConnectionUrl>scm:git:https://github.com/${remote.maven.repository.owner}/${remote.maven.repository.project}.git</developerConnectionUrl>
						<scmVersion>master</scmVersion>
						<!-- we always want to work with the master branch of the maven-repository project -->
						<scmVersionType>branch</scmVersionType>
						<providerImplementations>
							<git>jgit</git>
						</providerImplementations>
					</configuration>
				</execution>
			</executions>
			<dependencies>
				<dependency>
					<groupId>org.apache.maven.scm</groupId>
					<artifactId>maven-scm-provider-jgit</artifactId>
					<version>${scm.provider.jgit.version}</version>
				</dependency>
			</dependencies>
		</plugin>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-deploy-plugin</artifactId>
			<executions>
				<execution>
					<id>default-deploy</id>
					<configuration>
						<skip>true</skip>
						<!-- need to disable default-deploy otherwise it'll run before our git clone above -->
					</configuration>
				</execution>
				<execution>
					<id>deploy artifacts to checked out repository</id>
					<phase>deploy</phase>
					<goals>
						<goal>deploy</goal>
					</goals>
					<configuration>
						<altDeploymentRepository>build.repo::default::file://${local.maven.repository.path}</altDeploymentRepository>
					</configuration>
				</execution>
			</executions>
		</plugin>
		<plugin>
			<groupId>com.github.github</groupId>
			<artifactId>site-maven-plugin</artifactId>
			<executions>
				<execution>
					<id>merge updated repository back to github</id>
					<phase>deploy</phase>
					<goals>
						<goal>site</goal>
					</goals>
					<configuration>
						<repositoryOwner>${remote.maven.repository.owner}</repositoryOwner>
						<repositoryName>${remote.maven.repository.project}</repositoryName>
						<branch>refs/heads/master</branch>
						<outputDirectory>${local.maven.repository.path}</outputDirectory>
						<excludes>
							<exclude>.git/**/*</exclude>
							<!-- dont try pushing the .git directory -->
						</excludes>
						<message>deploy maven artifacts for ${project.groupId}:${project.artifactId}:${project.version}</message>
						<noJekyll>true</noJekyll>
						<merge>true</merge>
						<server>github</server>
						<!-- must match a <server> in settings.xml for credentials -->
					</configuration>
				</execution>
			</executions>
		</plugin>
	</plugins>
</build>

you will also need to specify your github credentials in your maven setting.xml file, in your user’s home directory.
to actually use all this configuration in any of your other github projects, all you need are 2 things:

  1. specify this parent as your project’s parent
  2. specify your repository in your project’s repositories section (so that maven could find the parent project).

for example, i’ve converted my jrgoups cluster lock demo project to use this mechanism, as you can see in the project’s pom file.

the main downside to this approach is the lack of cleanup. the maven deploy plugin can deploy to a maven repository (or any directory laid out as a maven repository), but cannot clean up older copies of a project for the repository. The repository is a simple git project, however, so when it gets too large for you you can simply check it out, clean it up, and push it back.

another drawback is that you clone the whole repository project every time. as long as you dont deploy giant binaries to it or hundreads of projects this probbaly isnt a big issue, but it might be. maybe some day i’ll learn some bit of git magic to only check out the “head” of the repository instead of cloning the whole thing complete with history…

Distributed Locking with JGroups

sometimes when building a distributed/clustered application you want to have a piece of code executed on only a single cluster node at the same time. standard solutions to this (the synchronized keyword, j2ee @Singleton, java concurrent locks …) dont cover clustered scenarios (any scenario where you want to ensure locking over multiple virtual machine instances), so something a little more complicated is needed. in my case the simplest solution i could find was jgroups.

jgroups is a very popular library for multicast communication and is the basis for the clustering capabilities of many other applications (jboss/infinispan for example) and, as will be shown in a minute, is very useful on its own as well.

ok then, on to code. the locking API we’re after is a very simple one:

public interface LockManager {
   void lock(String name);
   void release (String name);
}

this allows multiple locks. locking and unlocking is by lock name and method calls block.

jgroups is built around the concept of protocol stacks – you build a protocol stack starting from a transport layer at the bottom (udp in our case, tcp if you’re forced off udp) and pile on higher level functions on top of it: peer detection, packet fragmentation, retransmission, locking – which is what we’re after – etc.

here’s a simple jgroups stack, based off the default udp stack, that supports locking:

<config xmlns="urn:org:jgroups"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.1.xsd">
    <UDP
       mcast_port="${jgroups.udp.mcast_port:45588}"
       receive_on_all_interfaces="true"
    />
    <PING/>
    <MERGE2/>
    <FD_SOCK/>
    <FD_ALL/>
    <VERIFY_SUSPECT/>
    <BARRIER />
    <pbcast.NAKACK2/>
    <UNICAST2/>
    <pbcast.STABLE/>
    <pbcast.GMS/>
    <UFC/>
    <MFC/>
    <FRAG2/>
    <RSVP/>
    <pbcast.STATE_TRANSFER />
    <CENTRAL_LOCK/>
    </config>

pretty much all configuration values are defaulted, for brevity’s sake. this stack works as-is but you’ll probbaly want to tweak it for real-world use.

so, on top of this stack, we can implement our simple lock manager interface:

public class JgroupsLockManager implements LockManager{
   private JChannel channel;
   private LockService lockService;
   public JgroupsLockManager() {
      try {
         channel = new JChannel("udp-lock-stack.xml");
         lockService = new LockService(channel);
         channel.connect("LockCluster");
      } catch (Exception e) {
         throw new IllegalStateException(e);
      }
   }
   @Override
   public void lock(String name) {
      lockService.getLock(name).lock();
   }
   @Override
   public void release(String name) {
      lockService.getLock(name).unlock();
   }
}

and thats basically it.

if you want to play around with this yourself, i put the complete code for the demo application (with gui) on github. its a maven project that produces an executable jar. if you run it a window pops up with a big lock icon that allows you to tuggle a lock. you can run several instances on the same machine or on several machines and see for yourself how only one instance can obtain the lock at the same time. another nice feature of jgroups is that if you kill any of the instances the lock is released after a short while (2 seconds):

lock icons are from the oxygen icon pack

RAM drive for fun and profit

A lot of development machines these days have more memory then they need, but not enough of them pack SSDs. And even those that do might be encumbered by all sorts of workplace-mandated annoyances like full drive encryption, hyperactive antivirus software, that utterly useless backup app that IT set to run every day at mid-noon, those sorts of things.

Introducing the RAM drive – take a chunk of unused memory and turn it into a hard drive. It may not store data after a reboot, but it’s still quite useful (as i’ll strive to demonstrate in a few posts).

The software I like best for this is ImDisk – its lean, and its open source to boot. Once its installed its really simple to configure it to create and format a 2GB Z: drive each boot, using the windows task scheduler and a script along the lines of

imdisk -a -t vm -m Z: -s 2G -p “/v:RAMDrive /FS:NTFS /X /y”

you can read up on how to get this to be executed @boot here

and here’s the end result: the ram drivewe’ll get to using this to speed up builds later