(:damion @world-state)

freelance software architect. artist. radio ham.


Stanford's CoreNLP and Clojure: Sentiment Analysis


categories:   clojure sentimentanalysis nlp

Stanford CoreNLP Java Library

This is Stanford CoreNLP in its (their?) own words:

Stanford CoreNLP provides a set of natural language analysis tools which can take raw text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, etc. Stanford CoreNLP is an integrated framework. Its goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. It is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

Ok sounds great, and since it’s a Java library, it’s also a Clojure one! If you’re reading this, you probably already know about the incredible interop between Clojure and Java, so I’ll move along to some code examples of this interop!

CoreNLP Fundamentals

CoreNLP works by using Annotators to build Annotations over a stream of text using a CoreNLP pipeline. There are a lot of different annotators available, and for the purposes of this brief article, we’re using the sentiment annotator. In the next few days I’ll add support for the ner annotator as well, since named entity recognition is a pretty cool and useful feature.

Sentiment Analysis

Sentiment Analysis was added to CoreNLP at version 3.3.0. You can read all about it at the Stanford site. The paper is available for download as well. To get going and use the CoreNLP sentiment annotator in your own projects, you can use my library, or yank out the useful bits of code for your own use.

I created damionjunk.nlp as a simple Clojure library to wrap the various NLP annotations provided by CoreNLP and return the annotations as sequences of maps. Something like:

({:sentiment 0 :text "very angry text!"} {:sentiment 4 :text "very happy text!"} ... {})

Quick and Easy Sentiment Analysis in Clojure

Dependency

Add my library to your leiningen dependencies. I publish to Clojars so it’s easy: [damionjunk/nlp "0.1.0"]

The latest version is: Clojars Project

An example project.clj:

(defproject damionjunk/some-new-project "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
           :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
              [damionjunk/nlp "0.1.0"]
                 ])

You’ll need to run lein deps of course, to pull the CoreNLP code and provided language models.

Measuring Sentiment of Text

To use, just feed text to the function!

(require '[damionjunk.nlp.stanford :as nlp])

(nlp/sentiment-maps "Hi there. I really hated that movie. Just kidding, I loved it!")

;; => ({:sentiment 2, :text "Hi there."}
;;     {:sentiment 1, :text "I really hated that movie."}
;;     {:sentiment 3, :text "Just kidding, I loved it!"})

The sentiment annotator measures sentiment on a 0 to 4 scale. 0 is very low sentiment, 1 is low, 2 is neutral, 3 is high, and 4 is very high.

You can see that with the provided text we had fairly accurate results:

Hi there.

This measured neutral.

I really hated that movie.

This measured negative.

Just kidding, I loved it!

This measured positive.

Conclusion

Your results may vary. There is plenty of discussion to be found on the validity of sentiment analysis, and the various techniques that measure it. I believe sentiment analysis can be useful when trying to get a macro-level feel for the sentiment of a topic or set of topics.

As I mentioned, I’ll be adding the other annotators to the library shortly, and plan to provide code for a simple twitter to Stanford sentiment data collector in Clojure.

Stay tuned!