Irulan: High Coverage Testing of Haskell Programs

About

Irulan is a black box testing tool for Haskell. It uses the GHC API to automatically load Haskell modules and then automatically constructs and runs expressions to test the functions found therein. It can be used for systematic crash testing, property testing or test-suite generation and comparison for regression testing.

Irulan was developed by me, Tristan Allwood, with mentoring by Lecturer Cristian Cadar and my PhD supervisor Prof. Susan Eisenbach.

Resources

Source Releases:

The sources for Irulan can be downloaded as a source tarball, or the development version is available from Irulan's public monotone repository description source browser.mtn clone irulan.mtn-host.prjek.net uk.co.zonetora.toral.irulan.6.12.3 irulan Note the development version may have many new cool features not (yet) documented and behave differently to what is in the technical reports below, also it is not always guaranteed to build.

Irulan v1.0.0: [tgz]
Irulan v0.9.1: [tgz]
Irulan v0.9: [tgz]

Papers

PhD Thesis describing Irulan: [pdf]
"High Coverage Testing of Haskell Programs": [pdf]. At ISSTA 2011.
"Irulan: Systematic Testing of Haskell Programs": [online]. Unpublished technical report, April 2010. (Irulan has been substantially improved since this version).

Usage

Building:

Irulan should be built using the standard Haskell Cabal infrastructure. cabal configure && cabal build && cabal install in an unpacked source tarball is your friend.

Note: Irulan has only been tested on x86 and x86_64 Linux systems and is currently only supported with ghc 6.12.3.

Running:

irulan --help to see all the options.

Simplest use: $ irulan source Foo.

If you want code coverage information use: irulan --with-stats-output=foo.stats source Foo or irulan source Foo --ghc-options='-fforce-recomp -fhpc'

At the end you'll get an irulan.tix that the hpc tool can use to report or markup your sources. The -fforce-recomp isn't always necessary, but stops ghc getting confused by stale mix/tix files.

Note, the example commandline and output in the papers have been prettified for presentation.

Use examples:

Crash testing:

>cat Crash.hs 
module Crash where

import Data.Either

cantCrashThis :: Maybe Bool -> [Int]
cantCrashThis Nothing     = [123, error "crash this!"]
cantCrashThis (Just True) = []


# Assuming expressions use show to visualise results:
>irulan -d --depth=10 source Crash
Crash:
 Results:
  cantCrashThis (Nothing) ==> ! crash this!
  cantCrashThis (Just False) ==> ! 
    Crash.hs:(6,0)-(7,29): Non-exhaustive patterns in function cantCrashThis

# Don't use show, actually pick values apart:
>irulan -d --depth=10 --disable-show-results --enable-case-statements source Crash
Crash:
 Results:
  case case cantCrashThis (Nothing) of _ : x -> x of x : _ -> x ==> ! crash this!
  cantCrashThis (Just False) ==> ! 
    Crash.hs:(6,0)-(7,29): Non-exhaustive patterns in function cantCrashThis

Lightweight property testing:

>cat Prop.hs 
module Prop where

prop_ListLength :: [a] -> [b] -> Bool
prop_ListLength xs ys = length xs == length zs
  where
      zs = zip xs ys

>irulan -p --enable-quiet -d --depth=10 source Prop
Prop:
 Results:
  prop_ListLength (: ? ([])) ([]) ==> False

Test Suite Generation, Comparision and Analysis:

>cat V1.hs
module V1(convert) where

convert :: [Bool] -> [Bool] -> [Int]
convert xs ys = [ if x then 2 else 3 + if y then 0 else 1 
                | x <- xs, y <- ys
                ]

>cat V2.hs
module V2(convert) where

convert :: [Bool] -> [Bool] -> [Int]
convert xs ys = [ helper x y | x <- xs , y <- ys ]

helper :: Bool -> Bool -> Int
helper y x = (if x then 2 else 3) + (if y then 0 else 1)


>irulan -d --depth=20 --full-testsuite=v1.tst source V1
V1:
 Results:

>irulan -d --depth=20 --full-testsuite=v2.tst source V2
V2:
 Results:

>irulan tsa v1.tst v2.tst 
A: v1.tst
B: v2.tst

A:  convert (: False (: False [])) (: False []) ==> [4,4]
B:  convert (: False (: False [])) (: False []) ==> [4,4]

A:  convert (: False (: False [])) (: True []) ==> [3,3]
B:  convert (: False (: False [])) (: True []) ==> [3,3]

A~  convert (: False (: True [])) (: False []) ==> [4,2]
B~  convert (: False (: True [])) (: False []) ==> [4,3]

A:  convert (: False (: True [])) (: True []) ==> [3,2]
B:  convert (: False (: True [])) (: True []) ==> [3,2]

A:  convert (: False []) (: False (: False [])) ==> [4,4]
B:  convert (: False []) (: False (: False [])) ==> [4,4]

A:  convert (: False []) (: False (: True [])) ==> [4,3]
B:  convert (: False []) (: False (: True [])) ==> [4,3]

A:  convert (: False []) (: False []) ==> [4]
B:  convert (: False []) (: False []) ==> [4]

A:  convert (: False []) (: True (: False [])) ==> [3,4]
B:  convert (: False []) (: True (: False [])) ==> [3,4]

A:  convert (: False []) (: True (: True [])) ==> [3,3]
B:  convert (: False []) (: True (: True [])) ==> [3,3]

A:  convert (: False []) (: True []) ==> [3]
B:  convert (: False []) (: True []) ==> [3]

A~  convert (: True (: False [])) (: False []) ==> [2,4]
B~  convert (: True (: False [])) (: False []) ==> [3,4]

A:  convert (: True (: False [])) (: True []) ==> [2,3]
B:  convert (: True (: False [])) (: True []) ==> [2,3]

A#  convert (: True (: True [])) (: ? (: ?1 [])) ==> [2,2,2,2]

A~  convert (: True (: True [])) (: ? []) ==> [2,2]
B~  convert (: True (: True [])) (: False []) ==> [3,3]
B:  convert (: True (: True [])) (: True []) ==> [2,2]

A#  convert (: True []) (: ? (: ?1 (: ?2 (: ?3 [])))) ==> [2,2,2,2]

A#  convert (: True []) (: ? (: ?1 (: ?2 []))) ==> [2,2,2]

A~  convert (: True []) (: ? (: ?1 [])) ==> [2,2]
B~  convert (: True []) (: False (: False [])) ==> [3,3]
B~  convert (: True []) (: False (: True [])) ==> [3,2]
B~  convert (: True []) (: True (: False [])) ==> [2,3]
B:  convert (: True []) (: True (: True [])) ==> [2,2]

A~  convert (: True []) (: ? []) ==> [2]
B~  convert (: True []) (: False []) ==> [3]
B:  convert (: True []) (: True []) ==> [2]

A:  convert (: ? (: ?1 (: ?2 (: ?3 (: ?4 (: ?5 [])))))) [] ==> []
B:  convert (: ? (: ?1 (: ?2 (: ?3 (: ?4 (: ?5 [])))))) [] ==> []

A:  convert (: ? (: ?1 (: ?2 (: ?3 (: ?4 []))))) [] ==> []
B:  convert (: ? (: ?1 (: ?2 (: ?3 (: ?4 []))))) [] ==> []

A:  convert (: ? (: ?1 (: ?2 (: ?3 [])))) [] ==> []
B:  convert (: ? (: ?1 (: ?2 (: ?3 [])))) [] ==> []

A:  convert (: ? (: ?1 (: ?2 []))) [] ==> []
B:  convert (: ? (: ?1 (: ?2 []))) [] ==> []

A:  convert (: ? (: ?1 [])) [] ==> []
B:  convert (: ? (: ?1 [])) [] ==> []

A:  convert (: ? []) [] ==> []
B:  convert (: ? []) [] ==> []

A:  convert [] ? ==> []
B:  convert [] ? ==> []

Experimental Results

Nofib

As described in the most recent paper, we ran Irulan on a filtered version of the real and spectral suites from nofib. The experimental results from all runs (1,10,60 and 300 seconds per module, with (C) and without (N) using case statment generation) are below.

Real suite all experiments: graphs hpc reports run outputs
Spectral suite all experiments: graphs hpc reports run outputs

Property Testing

We have compared Irulan to the other property testing libraries (QuickCheck, SmallCheck and Lazy SmallCheck) on the benchmark available from here. Full details will hopefully be published in the next few months in my thesis.

Raw data on the tests: online
Graph of code coverage: [png]

Regression Testing

In the most recent draft paper we discuss several case studies based on using Irulan for regression or behaviour change testing.

Haskell Undergraduate Exam:Please contact us if you would like to see the traces and data relating to this. We cannot make it available freely online as it will likely be re-used as a practice exam by future students.
Hackage - TreeStructures: [Hackage] [irulan commands|txt] [test suite|html]
Hackage - Presburger: [Hackage] [irulan commands|txt] [test suite|html]