In the past years I have become a fan of the final keyword for variables and members in Java. But many people reading my source code opposed this view: They find that using final for every non-reassignable variable clutters the code because most variables will not be reassigned. When I recently talked to Uncle Bob and he uttered the same opinion, I decided to search the Web for reasons for and against the usage of final.

Most of the arguments stem from these source:

Most of the time, I refer to final as a modifier to class members and local variables.

When final is inevitable or very recommended

  • Constants
    By convention constants should have static and final modifiers. 

    private static final int CONSTANT = 1;
  • Inner classes:
    Inner classes can only access local variables when they are final.
  • Utility classes:
    Utility classes should be final and have a private constructor: 

    public final class CollectionUtils {
        private CollectionUtils() {
            new UnsupportedOperationException("private class");
        }
    }

 Pro final

  • Easier to understand during maintenance and debugging
    It is clear which variables remaing the same within the current scope. final reduces complexity.
  • Avoids NullPointerExceptions
    In order to check whether a variable is null or not, you only have to check its initialization.
  • Color patterns
    After a while, the color patterns that result from the frequent finals may help to navigate through the code.
  • Compiler can optimize
    Some people argue that the compiler is able to optimize when final is used.
  • Immutabilitiy
    Using final is necessary (but not sufficient) to enforce immutability. 

    • Use Collections.unmodifiable… to make immutable collections
    • final fields need to be set in the constructor. Some frameworks (such as UIMA) expect variables to be initialized in an initialize method. In this case, we cannot use final.
  • Fosters thread-safety
    Synchronizing on final variables is safer.
  • Extension points:
    Marking methods as final allows to quickly find out, which methods serve as extension points. Joshua Bloch’s Effective Java even suggests to make as many methods (and classes) final as possible.
  • Discourages overbroad scoping
    Every local variable should serve one purpose. Using final, we can avoid the reusage of dummy variables such as in the following example (taken from Stackoverflow): 

    String msg = null;
    for(int i = 0; i < 10; i++) {
        msg = "We are at position " + i;
        System.out.println(msg);
    }
    msg = null;

Contra final

  • Clutters the code
    Normally, most (local) variables are assigned only once and are thus eligible for bearing final in front of them.
  • Hard-to-read method signatures
    When you make a method and all of its parameters final, the method signature is unlikely to fit one line even with two parameters. This makes the method signature hard to read.
  • Final can be replaced with static code checkers…
    … at least in some places: IDEs and code checkers (PMD, Style check,…) check that method parameters are not re-assigned. However, some of them will suggest to make method parameters final :-)
  • final is not const
    Objects that are labeled with final can still be modified if they have non-immutable members. If you are accustomed to the const keyword in C/C++, this behavior is misleading.
  • May slow down development
    While coding, you may change your mind on whether a certain variable is final or not. Every time, you change your mind, you have to add or remove the final. However, many IDEs support you in both directions: Eclipse can add final where possible on saving and, vice versa, when you try to re-assign a final variable, it will offer you to make it non-final.

DKPro Core contains a component that wraps the popular TreeTagger.

Unfortunately, only the core component de.tudarmstadt.ukp.dkpro.core.treetagger-asl is directly available as Maven artifact, while license restrictions disallow to redistribute the binaries (de.tudarmstadt.ukp.dkpro.core.treetagger-bin) and the models (de.tudarmstadt.ukp.dkpro.core.treetagger-model-{de,en,fr,…}). The DKPro Core developer team provides instructions on how to create the latter artifacts, using an ant build.xml script.

The Maven dependencies of the TreeTagger component look as follows. It is important to use dependency management in order to coordinate the versions of the three artifacts.

<dependencies>
  <dependency>
    <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
    <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-asl</artifactId>
  </dependency>
  <dependency>
    <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
    <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-bin</artifactId>
  </dependency>
  <dependency>
    <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
    <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-de</artifactId>
  </dependency>
</dependencies>
<dependencyManagement>
  <dependency>
    <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
    <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-asl</artifactId>
    <version>1.5.0</version>
    <type>pom</type>
    <scope>import</scope>
  </dependency>
</dependencyManagement>

 References

  • [1] TreeTagger project site
  • [2] Instructions on packaging the binary and model artifacts

WordNet is an invaluable resource for NLP research. John Didion has developed a Java library for accessing WordNet data in a programmatic way. To access WN from Java, the following steps are necessary:

  1. Download WordNet
  2. Add a dependency to JWNL to your project or download the library.
  3. Configure properties.xml so that JWNL knows where to find WordNet and which version is used.
  4. Create Dictionary instance for querying WordNet.

Configuration

The configuration is stored in an XML file that sets the path where WordNet can be found. If you use a standard WN distribution, then the path should end in dict as the following minimalistic properties.xml illustrates:

<?xml version="1.0" encoding="UTF-8"?>
<jwnl_properties language="en">
  <version publisher="Princeton" number="3.0" language="en"/>
  <dictionary>
    <param name="dictionary_element_factory" 
      value="net.didion.jwnl.princeton.data.PrincetonWN17FileDictionaryElementFactory"/>
    <param name="file_manager" value="net.didion.jwnl.dictionary.file_manager.FileManagerImpl">
      <param name="file_type" value="net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"/>
      <param name="dictionary_path" value="path-to-dict"/>
    </param>
  </dictionary>
  <resource/>
</jwnl_properties>

On GitHub, you find two prepared properties files:

  • properties_min.xml uses only a minimum of the possible settings
  • properties.xml includes a rule-based morphological stemmer that allows you to query for inflected forms, e.g., houses, runs, dogs

Boilerplate code

A singleton instance of Dictionary is used to query WordNet with JWNL. In fact, setting up the dictionary is very easy:

JWNL.initialize(new FileInputStream("src/main/resources/properties.xml"));
final Dictionary dictionary = Dictionary.getInstance();

Afterwards, you can easily query the dictionary for a lemma of your choice (try house, houses, dog). For each lemma, you also specify one of the 4 possible part-of-speech classes that you are looking for, that is one of POS.ADJECTIVE, POS.ADVERB, POS.NOUN, POS.VERB. For house you would choose POS.NOUN or POS.VERB. The whole process looks rather clumsy, so I listed the steps below:

  1. Lookup: Is the lemma in the dictionary?
    final IndexWord indexWord = dictionary.lookupIndexWord(pos, lemma);

    • If the lookup fails, indexWord is null.
  2. What different senses may the lemma have?
    final Synset[] senses = indexWord.getSenses();
  3. For each sense, we may get a short description of the sense, called the gloss.
    final String gloss = synset.getGloss();
  4. What other lemmas are in a synset?
    final Word[] words = synset.getWords();

    • For each word, we may get its lemma and its POS: word.getLemma(); and word.getPOS().getKey();

Where to get it

The code for this tutorial is available on GitHub. You need to copy the template properties file(s) in src/main/resources before you can run the code. Given an lemma and part-of-speech, the program returns the list of synsets that contain the lemma. For house/v the output looks like so:

Aug 23, 2013 9:13:40 AM net.didion.jwnl.dictionary.Dictionary doLog
INFO: Installing dictionary net.didion.jwnl.dictionary.FileBackedDictionary@6791d8c1
 1 Lemmas: [house/v] (Gloss: contain or cover; “This box houses the gears”)
 2 Lemmas: [house/v, put_up/v, domiciliate/v] (Gloss: provide housing for; “The immigrants were housed in a new development outside the town”)

Maven dependency for JWNL reader and the necessary logging:

<dependency>                        
  <groupId>net.didion.jwnl</groupId>
  <artifactId>jwnl</artifactId>     
  <version>1.4.0.rc2</version>      
</dependency>
<dependency>
  <groupId>commons-logging</groupId>
  <artifactId>commons-logging</artifactId>
  <version>1.1.3</version>
</dependency>

Links

  • [1] JWNL Sourceforge site
  • [2] JWNLSourceforge Wiki with much more information
  • [3] WordNet 3.0 download

Naively, I supposed that every varargs parameter in Java could be treated just like a normal array. This is not the case for primitive arrays as I had to learn!

Using the standard library method Arrays.asList(T…) that converts arrays/varargs of objects to java.util.List‘s, an idiomatic code snippet may look like this:

final int[] ints = { 3, 2, 8, 1, 1, 5 };
final List<Integer> list = Arrays.asList(ints);
Collections.sort(list);

However, the Java compiler complains about the second line:

Type mismatch: cannot convert from List<int[]> to List<Integer>

Obviously, the type parameter T is resolved to int[]. Apache Commons-Lang offers helps to resort from this problem: Its ArrayUtils.toObject() method takes a primitive array and converts it to the corresponding object array. The following, modified listing demonstrates this:

final int[] ints = { 3, 2, 8, 1, 1, 5 };
// final List<Integer> list = Arrays.asList(ints);
final Integer[] intObjects = ArrayUtils.toObject(ints);
final List<Integer> list = Arrays.asList(intObjects);
Collections.sort(list);

Where to get it

Maven Dependency (Download):

<dependency>
    <groupId>commons-lang</groupId>
    <artifactId>commons-lang</artifactId>
    <version>2.6</version>
</dependency>

Links

  • [1] Related stackoverflow question
  • [2] Arrays.asList(T…) docu (Java 6)
  • [3] ArrayUtils.toObject() docu (Commons-Lang 2.6)

If your software produces costly objects, object serialization may be an option to spare you some bootstrapping time, e.g., when you repeatedly restart your application during development. Apache Commons-Lang offers an implementation of serialization that is an epitome of ease of use: SerializationUtils. The core methods are serialize and deserialize.

Given an object, the actual process of serialization is a one-line statement (split up here):

final File targetFile = new File("./target/serializedObject.ser");
final BufferedOutputStream outStream = new BufferedOutputStream(new FileOutputStream(targetFile));
SerializationUtils.serialize(object, outStream);

The same holds for the deserialization process. In these subsequent lines, we assume that the object to be derserialized is an instance of java.lang.String:

final BufferedInputStream inStream = new BufferedInputStream(new FileInputStream(targetFile));
final String string = (String) SerializationUtils.deserialize(inStream);

A complete executable example can be found on GitHub.

Maven Dependency (a Jar file for download can be found here):

<dependency>
    <groupId>commons-lang</groupId>
    <artifactId>commons-lang</artifactId>
    <version>2.2</version>
</dependency>

Links

  • [1] SerializationUtils Javadoc
  • [2] Executable example code (Maven project)

Even though it is not a good implementation style some libraries print information to standard output or standard error. If you have no influence on that, this can be annoying and may conceal important information. A stackoverflow question deals with this problem and I find the following solution the most elegant one:

final PrintStream originalOut = System.out;
System.setOut(new PrintStream(new OutputStream()
{ 
    @Override 
    public void write(final int b) throws IOException {/*do nothing*/} 
})); 

try{ <library call> } 
finally
{
    System.setOut(originalOut);
}

Instead of ignoring the output you may want to print it to a log file for further inspection in case something goes wrong. In this case, do not define your own stream but use an instance of FileOutputStream:

File file  = new File(filename);
PrintStream printStream = new PrintStream(new FileOutputStream(file));
System.setOut(printStream);

Analogously,  System.setErr(PrintStream) may be used for setting the new standard output stream.

Links

  • [1] stackoverflow question which pointed me at this

By default, Maven does not compile any debug information into an artifact. This is (sometimes) reasonable when you publish libraries which you do not want to reveal too much information about. For personal use, however, I prefer to be able to have a look at the full stack trace. Another benefit of adding debugging information is that you can read the original parameter names of methods.

As debug information is added at compile time, the Maven Compiler plugin is responsible for managing this kind of information:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.0.2</version>
      <configuration>
        <!-- Necessary in order for the debug levels to be considered-->
        <debug>true</debug> 
        <debugLevel>lines,vars,source</debugLevel>
      </configuration>
    </plugin>
  </plugins>
</build>

Links

  • [1] Reference of Maven Compiler plugin

Depending on your version of Maven, the default Java version assumed by the compiler plugin may be quite old, even as old as not to recognize annotations. When working with annotations you will get compiler errors such as the following:

The method getProgress() of type TextFileReader must override a superclass method

Solution 1

You can configure Maven to accept (-source argument for the Java compiler) and produce byte code for (-target compiler argument) a certain Java version as follows:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.0.2</version>
      <configuration>
        <source>1.6</source>
        <target>1.6</target>
      </configuration>
    </plugin>
  </plugins>
</build>

Solution 2

It is even simpler to use properties:

<properties>
  <maven.compiler.source>1.6</maven.compiler.source>
  <maven.compiler.target>1.6</maven.compiler.target>
</properties>

Links

  • [1] Reference of Maven Compiler plugin

Java Standard Logging

This article describes a template for using and configuring logging in a Java application. Even without any configuration we obtain a logger by calling the static method Logger.getLogger(String name). By convention, I use the class name of the class whose member the logger is as name:

this.log = Logger.getLogger(this.getClass().getName());

The default logging behavior is to log all events to standard error, however we can easily redirect/mirror our logging to a log file. The necessary configuration is outsourced in a properties file (often called logging.properties, Download):

# Determine which of the handlers configured below shall be used
handlers= java.util.logging.ConsoleHandler, java.util.logging.FileHandler

# One log entry looks as follows:
#
# May 8, 2013 11:18:47 AM de.svenlogan.wordpress.logging.Application run
# INFO: This message goes to standard error and to the logfile
#
java.util.logging.SimpleFormatter.format="%1$tb %1$td, %1$tY %1$tl:%1$tM:%1$tS %1$Tp %2$s%n%4$s: %5$s%n"

# Only log events of at least level INFO (incl. WARNING, SEVERE,...) 
java.util.logging.ConsoleHandler.level=INFO
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter

# Log anything
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.pattern=logfile.txt
java.util.logging.FileHandler.append=true
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

In order to make this configuration file knwon to our application, we need to request the LogManager to read the configuration file:

try
{
    LogManager.getLogManager().readConfiguration(
    this.getClass().getClassLoader().getResourceAsStream("logging.properties"));
}
catch (final Exception e)
{
    e.printStackTrace();
}

You can test your logger as follows:

this.log.log(Level.INFO, "This message goes to standard error and to the logfile");
this.log.log(Level.FINE, "This message only goes to the logfile");

For other logging facilities such as Log4j, there exist mechanisms which allow for auto-discovering the logging properties.

I have uploaded an example application as Maven project on GitHub.

Log4j

Log4j will use a configuration properties file if it is called log4j.properties and can be found on the classpath. What follows is a simple template configuration that makes Log4j log to console:

log4j.rootLogger=INFO, stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.Target=System.err

# %m%n - print only the log message and a newline
# %d{ISO8601} %F:%L %n%6p %m%n - print date, file and line number in the first 
# and priority plus message in a second line
log4j.appender.stderr.layout.ConversionPattern=%m%n

# Makes log4j print self-configuration messages
# log4j.debug=true

The patterns used by Log4j have a different notations than the format property. The output format of each line is determined by the PatternLayout

Links

  • [1] Sample Project hosted on GitHub

Args4j is an easy-to-use and – for most cases – sufficiently powerful library for processing program options. An important feature is that the mapping from program option to java fields is managed via @Option and @Argument annotations. Moreover, support for “trailing” program options (normally used for input files) and multi-valued options is perfectly simple to set up.

The following small program may serve as a template for using Args4j (Download):

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.kohsuke.args4j.Argument;
import org.kohsuke.args4j.CmdLineException;
import org.kohsuke.args4j.CmdLineParser;
import org.kohsuke.args4j.Option;

public class Application
{

    @Option(name = "-o", aliases = "--output", required = true, usage = "Target/output file")
    private File outputFile;

    @Option(name = "-c", aliases = "--config", required = false, usage = "Configuration parameter (may be used multiple times)")
    private List<String> configurationParameters;

    @Option(name = "-h", aliases = "--help", required = false, usage = "Print help text")
    private boolean printHelp = false;

    // All non-option arguments may be treated with @Argument
    @Argument
    private List<File> inputFiles = new ArrayList<File>();

    /**
     * Configure application and return an appropriate exit code in order to signal if an error
     * occurred or the application shall be terminated as only the help message has been printed.
     * 
     * @param args
     *            the program options
     * @return the "exit code" of the paring process
     */
    public int configure(final String[] args)
    {
        final CmdLineParser parser = new CmdLineParser(this);
        parser.setUsageWidth(80);

        try {
            parser.parseArgument(args);

        }
        catch (final CmdLineException e) {

            if (this.printHelp) {
                System.err.println("Usage:");
                parser.printUsage(System.err);
                System.err.println();
                return 1;
            }
            else {

                System.err.println(e.getMessage());
                parser.printUsage(System.err);
                System.err.println();
                return 2;
            }
        }

        /*
         * Output variables for debugging
         */
        System.err.println("Output file: " + this.outputFile);
        System.err.println("Input files: " + this.inputFiles);
        System.err.println("Configuration: " + this.configurationParameters);

        return 0;
    }

    public static void main(final String[] args)
    {
        final Application application = new Application();
        application.configure(args);
    }

}

Maven Dependency

<dependency>
  <groupId>args4j</groupId>
  <artifactId>args4j</artifactId>
  <version>2.0.25</version>
</dependency>

Links

  • [1] Args4j project site
  • [2] Example code on GitHub