SINFA --- Sequential INFormation Analysis

Sequential Information Analysis (SINFA) is a technique developped in perceptual linguistics to study the amount of information that single phoneme characteristics carry in the process of speech perception.  

For instance, if in perceptual experiment a Consonant-Vowel-Consonant (CVC) confusion matrix is measured, how can we determine how much information there is transferred in the feature plosive.  The trick is to look at how plosive consonants are confused with non-plosives.  You can do this for any phonetic feature, and look at which feature carries the most information.  Suppose that is the feature nasality, you can ask yourself: given that I can transfer the feature nasality in this experiment, what is the next feature that is most informative?  And so on.  This is basically what SINFA. 

It has been developed in the '70s, referring to information theory developed in the '40s--'50s.  It is one of those techniques that are "well known" but have been forgotten a bit since.  Also, the computing infrastructure has changed since those times, we might want to run our code in a modern environment. 

This page contains some R code that will allow you to do your own SINFA analysis if you have a confusion matrix.  You can deliver the matrix and feature structure in .cvc format, and read in the data into R and run the analysis from there. 

Our SINFA code comes with some example data taken from an article from Wang and Bilger (1973), and you can inspect the input format, run the analysis and check that you get the same result as the paper does. 

Code

The R code with example tables can be found here.  

Example

  1. Download the code, and unpack using your favorite un-archiver
  2. (if you haven't got R: download and install R, should take you less than a minute.  On debian-derived linux environments type "apt-get install r-base")  
  3. start R from the directory (folder) where you unpacked the code and data.  If you donn't understand what this means, just start R. 
  4. At the R prompt type (after ## is just a comment---it is not necessary to type that)
    • setwd("path/to/code-and-data") ## this is optional, only necessary if you couldn't do the above
    • source("sinfa.R") ## this loads the code
    • read.wang.bilger() ## this loads the .csv tables in "x" and "f"
    • x ## have a look at what's in "x": the confusion matrix from table 6
    • f ## have a look at what's in "f": the table of features per phoneme
    • s <- sinfa(x, f, 7) ## run SINFA analysis, 7 levels deep
    • s ## print the full sinfa analysis
    • summary(s) ## give a summary of the order of most important features. 
  5. You should be able to check the numbers are more / less the same as in the Wang and Bilger paper. 

Terminology

We use a slightly different terminology in the output tables:

  • ent (entropy): Feature information, measured in bits
  • mut (mutual information): Transmitted information, measured in bits
  • rel (relative information): fraction of transmitted information: ent/mut