Working with the JWeb1TSearcher, I received tons of warning messages that some specific token could not be found:

com.googlecode.jweb1t.JWeb1TSearcher  - Could not find nGram-File for the symbol 'o

As I did not care about this warning, I wanted to turn it off. Looking at the dependencies of the enclosing artifact com.googlecode.jweb1t, I learned that JWeb1T library uses the Apache Commons Logging library. Therefore, in order to disable the logging (programmatically) for the JWeb1TSearcher class we can use the following code snippet:

Logger.getLogger("com.googlecode.jweb1t.JWeb1TSearcher").setLevel(Level.ERROR);

Only log messages of level ERROR or above will be logged now.

Links

  • [1] Dependencies of com.googlecode.jweb1t:com.googlecode.jweb1t
  • [2] Apache Commons Logging guide

Even though it is not a good implementation style some libraries print information to standard output or standard error. If you have no influence on that, this can be annoying and may conceal important information. A stackoverflow question deals with this problem and I find the following solution the most elegant one:

final PrintStream originalOut = System.out;
System.setOut(new PrintStream(new OutputStream()
{ 
    @Override 
    public void write(final int b) throws IOException {/*do nothing*/} 
})); 

try{ <library call> } 
finally
{
    System.setOut(originalOut);
}

Instead of ignoring the output you may want to print it to a log file for further inspection in case something goes wrong. In this case, do not define your own stream but use an instance of FileOutputStream:

File file  = new File(filename);
PrintStream printStream = new PrintStream(new FileOutputStream(file));
System.setOut(printStream);

Analogously,  System.setErr(PrintStream) may be used for setting the new standard output stream.

Links

  • [1] stackoverflow question which pointed me at this

By default, Maven does not compile any debug information into an artifact. This is (sometimes) reasonable when you publish libraries which you do not want to reveal too much information about. For personal use, however, I prefer to be able to have a look at the full stack trace. Another benefit of adding debugging information is that you can read the original parameter names of methods.

As debug information is added at compile time, the Maven Compiler plugin is responsible for managing this kind of information:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.0.2</version>
      <configuration>
        <!-- Necessary in order for the debug levels to be considered-->
        <debug>true</debug> 
        <debugLevel>lines,vars,source</debugLevel>
      </configuration>
    </plugin>
  </plugins>
</build>

Links

  • [1] Reference of Maven Compiler plugin

Depending on your version of Maven, the default Java version assumed by the compiler plugin may be quite old, even as old as not to recognize annotations. When working with annotations you will get compiler errors such as the following:

The method getProgress() of type TextFileReader must override a superclass method

Solution 1

You can configure Maven to accept (-source argument for the Java compiler) and produce byte code for (-target compiler argument) a certain Java version as follows:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.0.2</version>
      <configuration>
        <source>1.6</source>
        <target>1.6</target>
      </configuration>
    </plugin>
  </plugins>
</build>

Solution 2

It is even simpler to use properties:

<properties>
  <maven.compiler.source>1.6</maven.compiler.source>
  <maven.compiler.target>1.6</maven.compiler.target>
</properties>

Links

  • [1] Reference of Maven Compiler plugin

Java Standard Logging

This article describes a template for using and configuring logging in a Java application. Even without any configuration we obtain a logger by calling the static method Logger.getLogger(String name). By convention, I use the class name of the class whose member the logger is as name:

this.log = Logger.getLogger(this.getClass().getName());

The default logging behavior is to log all events to standard error, however we can easily redirect/mirror our logging to a log file. The necessary configuration is outsourced in a properties file (often called logging.properties, Download):

# Determine which of the handlers configured below shall be used
handlers= java.util.logging.ConsoleHandler, java.util.logging.FileHandler

# One log entry looks as follows:
#
# May 8, 2013 11:18:47 AM de.svenlogan.wordpress.logging.Application run
# INFO: This message goes to standard error and to the logfile
#
java.util.logging.SimpleFormatter.format="%1$tb %1$td, %1$tY %1$tl:%1$tM:%1$tS %1$Tp %2$s%n%4$s: %5$s%n"

# Only log events of at least level INFO (incl. WARNING, SEVERE,...) 
java.util.logging.ConsoleHandler.level=INFO
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter

# Log anything
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.pattern=logfile.txt
java.util.logging.FileHandler.append=true
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

In order to make this configuration file knwon to our application, we need to request the LogManager to read the configuration file:

try
{
    LogManager.getLogManager().readConfiguration(
    this.getClass().getClassLoader().getResourceAsStream("logging.properties"));
}
catch (final Exception e)
{
    e.printStackTrace();
}

You can test your logger as follows:

this.log.log(Level.INFO, "This message goes to standard error and to the logfile");
this.log.log(Level.FINE, "This message only goes to the logfile");

For other logging facilities such as Log4j, there exist mechanisms which allow for auto-discovering the logging properties.

I have uploaded an example application as Maven project on GitHub.

Log4j

Log4j will use a configuration properties file if it is called log4j.properties and can be found on the classpath. What follows is a simple template configuration that makes Log4j log to console:

log4j.rootLogger=INFO, stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.Target=System.err

# %m%n - print only the log message and a newline
# %d{ISO8601} %F:%L %n%6p %m%n - print date, file and line number in the first 
# and priority plus message in a second line
log4j.appender.stderr.layout.ConversionPattern=%m%n

# Makes log4j print self-configuration messages
# log4j.debug=true

The patterns used by Log4j have a different notations than the format property. The output format of each line is determined by the PatternLayout

Links

  • [1] Sample Project hosted on GitHub

Args4j is an easy-to-use and – for most cases – sufficiently powerful library for processing program options. An important feature is that the mapping from program option to java fields is managed via @Option and @Argument annotations. Moreover, support for “trailing” program options (normally used for input files) and multi-valued options is perfectly simple to set up.

The following small program may serve as a template for using Args4j (Download):

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.kohsuke.args4j.Argument;
import org.kohsuke.args4j.CmdLineException;
import org.kohsuke.args4j.CmdLineParser;
import org.kohsuke.args4j.Option;

public class Application
{

    @Option(name = "-o", aliases = "--output", required = true, usage = "Target/output file")
    private File outputFile;

    @Option(name = "-c", aliases = "--config", required = false, usage = "Configuration parameter (may be used multiple times)")
    private List<String> configurationParameters;

    @Option(name = "-h", aliases = "--help", required = false, usage = "Print help text")
    private boolean printHelp = false;

    // All non-option arguments may be treated with @Argument
    @Argument
    private List<File> inputFiles = new ArrayList<File>();

    /**
     * Configure application and return an appropriate exit code in order to signal if an error
     * occurred or the application shall be terminated as only the help message has been printed.
     * 
     * @param args
     *            the program options
     * @return the "exit code" of the paring process
     */
    public int configure(final String[] args)
    {
        final CmdLineParser parser = new CmdLineParser(this);
        parser.setUsageWidth(80);

        try {
            parser.parseArgument(args);

        }
        catch (final CmdLineException e) {

            if (this.printHelp) {
                System.err.println("Usage:");
                parser.printUsage(System.err);
                System.err.println();
                return 1;
            }
            else {

                System.err.println(e.getMessage());
                parser.printUsage(System.err);
                System.err.println();
                return 2;
            }
        }

        /*
         * Output variables for debugging
         */
        System.err.println("Output file: " + this.outputFile);
        System.err.println("Input files: " + this.inputFiles);
        System.err.println("Configuration: " + this.configurationParameters);

        return 0;
    }

    public static void main(final String[] args)
    {
        final Application application = new Application();
        application.configure(args);
    }

}

Maven Dependency

<dependency>
  <groupId>args4j</groupId>
  <artifactId>args4j</artifactId>
  <version>2.0.25</version>
</dependency>

Links

  • [1] Args4j project site
  • [2] Example code on GitHub

The toolkit uimaFIT allows you to design annotation types in the so-called Component Descriptor Editor, save the descriptor as XML and generate the corresponding Java classes. When you run an analysis engine that makes use of these annotations uimaFit checks whether it finds the type system descriptor XML file on the classpath. In certain circumstances, it may happen that it falls short of finding this file and will abort with the following error message:

Caused by: org.apache.uima.cas.CASRuntimeException: 
JCas type "de.tudarmstadt.ukp.teaching.nlp4web.ml.type.NamedEntity" used in Java code, but was not declared in the XML type descriptor.
    at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:412)
    at org.apache.uima.jcas.cas.TOP.<init>(TOP.java:92)
    at org.apache.uima.jcas.cas.AnnotationBase.<init>(AnnotationBase.java:53)
    at org.apache.uima.jcas.tcas.Annotation.<init>(Annotation.java:54)

In this case you need to explicitly point uimaFIT to the location of your type descriptor file. For this to be done, there exist two different ways.

Solution via VM argument

As a quick and dirty solution you add the following VM argument which in Eclipse is configured in the Launch Configuration dialog

-Dorg.uimafit.type.import_pattern=classpath*:desc/types/**/*.xml

In my case, I stored the XML file in src/main/resources/desc/types/TypeSystem.xml where src/main/resources is set to be a source folder in Eclipse (Maven project layout).

Solution via types.txt (suggested)

The issue with the solution above is that your code will break if other people try to execute it without knowing about the VM parameter. There exists a more stable way to solve this issue. uimaFIT looks into the file

  • <classpath>/META-INF/org.uimafit/types.txt  (for uimaFit until 1.4.x, still supported by the uimafit-legacy-support package)
  • <classpath>/META-INF/org.apache.uima.fit/types.txt (for uimaFit 2.0.0 and above)

in order to find out where to search for the XML type descriptor files (In a Maven context META-INF should be located in src/main/resources). Each line in this types.txt file describes one search pattern. In our example, types.txt should contain one line:

classpath*:desc/types/**/*.xml

Links

  • [1] uimaFit Guide and Reference – 8.1. Making types auto-detectable
  • [2] DKPro tutorial (UIMA part): Type System Auto-Discovery (Slide 37)
  • [3] TypeDescriptorDetection

When you use AbstractTransactionalJUnit4SpringContextTests for database testing, you can easily access the database using a SimpleJdbcTemplate which is autowired with a DataSource bean. If, however, Spring finds more than one such data source on the classpath, it will fail to run the test. In the following error message, there are two candidates for autowiring.

No unique bean of type [javax.sql.DataSource] is defined: 
expected single matching bean but found 2: [csxDataSource, citeGraphDataSource]

The solution is to override the setDataSource method and to explicitly provide the DataSource instance to use (by bean name):

@Override
@Resource(name = "csxDataSource")
public void setDataSource(final DataSource dataSource) {
    this.simpleJdbcTemplate = new SimpleJdbcTemplate(dataSource);
}

The Apache Commons IO library offers a utility method which allows you to read a file directly to a string in just one line of code. As test case, I use the following, UTF-8 encoded file:

Hello World!
Some german umlauts: ä,ö,ü,ß
Final line

The Java code below demonstrates the use of the two methods readFileToString and readLines, both with and without the explicit encoding being given. I deliberately use the wrong encoding for reading the file in order to show the effect.

try {
   final File inputFile = new File("/tmp/test.txt");
   final String encoding = "ISO-8859-15";

   final String contentWithDefaultCharset = FileUtils.readFileToString(inputFile);
   final String contentExplicitEncoding = FileUtils.readFileToString(inputFile, encoding); 

   final List<String> contentSplitInLines = FileUtils.readLines(inputFile);
   final List<String> contentSplitInLinesExplicitEncoding = FileUtils.readLines(inputFile, encoding);

   System.out.println("Content (default encoding):");
   System.out.println(contentWithDefaultCharset);
   System.out.println("Content (" + encoding + "):");
   System.out.println(contentExplicitEncoding);
   System.out.println("Content as lines (default encoding):");
   System.out.println(contentSplitInLines);
   System.out.println("Content as lines (" + encoding + "):");
   System.out.println(contentSplitInLinesExplicitEncoding);
} catch (final IOException ex) {
    ex.printStackTrace();
}

Output:

Content (default encoding):
Hello World!
Some german umlauts: ä,ö,ü,ß
Final line

Content (ISO-8859-15):
Hello World!
Some german umlauts: À,ö,ÃŒ,ß
Final line

Content as lines (default encoding):
[Hello World!, Some german umlauts: ä,ö,ü,ß, Final line]
Content as lines (ISO-8859-15):
[Hello World!, Some german umlauts: À,ö,ÃŒ,ß, Final line]

Maven Dependency

(Check current version here)

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-io</artifactId>
  <version>1.3.2</version>
</dependency>

Links

The following code showcases how we can instantiate collections inside application context files in Spring. I suppose we want to use a class DummyContainer as a bean and we instantiate the following collection type properties of this class (assume the appropriate setters are given). The class MyBean is defined elsewhere and its actual properties do not matter here. We use two instances of the class in order to demonstrate reference values and inplace bean definitions:

  • java.util.List<Object> aList;
    • Content:  [“Hello World”, new Double(1.3), new MyBean(), new MyBean()]
    • XML element: <list/>
  • java.util.Map<String, Object> aMap;
    • Content: [[“Item 1”, “Hello World”],[“Item 2”, new Double(1.3)],[“Item 3”, new MyBean()],[“Item 4”, new MyBean()]]
    • XML element: <map/> and <entry/>
  • java.util.Properties properties;
    • Content: [[“Prop 1”, “Value 1”]]
    • XML element: <props/> and <prop/>
  • java.util.Set<Object> aSet;
    • Content: [“Hello World”, new Float(1.3), new MyBean(), new MyBean()]
    • XML element: <set/>

This is how the context  file looks like:

<beans>
  <bean id="myBean" class="MyBean"/>
&nbsp; <bean id="dummyContainer" class="DummyContainer">
    <property name="aList">
      <list>
        <value>"Hello World"</value>
        <value>1.3</value>
        <ref id="myBean"/>
        <bean class="MyBean/>
      </list>
    </property>
    <property name="aMap">
      <map>
        <entry key="Item 1" value="Hello World"/>
        <entry key="Item 2" value="1.3"/>
        <entry key="Item 3" value-ref="myBean"/>
        <entry key="Item 4"><bean class="MyBean/></entry>
      </map>
    </property>
    <property name="properties">
      <props>
        <prop key="Prop 1">Value 1</prop>
      </props>
    </property>
    <property name="aSet">
      <set>
        <value>"Hello World"</value>
        <value>1.3</value>
        <ref id="myBean"/>
        <bean class="MyBean/>
      </set>
    </property>
  </bean>
</beans>

Links

  • Spring 3.1.x Reference – Section 4.4.2.4 Collections
  • mkyong‘s summary of the collection elements (actually nicer than my entry here :-))