Introduction


This tutorial presents toos for quantification and evaluation of loops contained in character, following work from @marquis2005 and @bozza2008.

The fourier quantification method and likelihood statistical evaluation are respectively implemented in ExtractFourier and TwoLevelLR methods.

suppressPackageStartupMessages(library(ForensicDocument))

Loop quantification

back to top


The loop quantification method, for contained in handwritten characters as described in @marquis2005, is implemented in the ExtractFourier method. In short, binary character images are skeletonized and quantified with a function \(R(\theta)\), where \(R(\theta)\) is the distance of the character skeleton to its barycenter at the angke \(\theta\).

We give a more detailed description of this method:

  1. The algoritm used for the skeletonisation process is based on the one proposed by @stentiford1983. It has been modified for the particular case of closed loop by removing the end point condition, this tweek avoids to perform prunning on the resulting skeleton (see @stentiford1983 for further details).
  2. Skeletons (or loop) are parametrised by a discrete function \(R(\theta)\), representing the length of a line joining a point of the contour to the barycenter. \(\theta\) being the angle made by this line with the horizontal axis, with \(0 \leq \theta < 2\times\pi\).
  3. Function \(R(\theta)\) are resampled for n.samp \(\theta\) values. That is, for the values \(\theta=\frac{2\pi n}{n.samp+1}, \, n=0,\dots,n.samp-1\).
  4. Selected Fourier parameters are extranted from the signal \(R[\theta]\) (using the fft method from stats).

Parameters

back to top

The parameters for this method are:

Example

back to top

In this example, we use the binarised handwritten character o (\emph{fig-O.png} file supplied in the package). The character image is not that is not sketonized skeletonize=TRUE.

We use a sampling size of 128 (n.samp=128) with 7 fourier harmonics (n.fourier=6+1) as in @marquis2005. We will get the ouput as a data.fame, therefore ouput=NULL. The verbose option is set to FALSE. In this case, the method ExtractFourier is used as follows:

files = system.file("extdata", "fig-O.png", package = "ForensicDocument")
result = ExtractFourier(files = files, 
                        n.fourier = 7, 
                        n.samp = 128, 
                        verbose = FALSE, 
                        output = NULL,
                        character_pixel = 0)
result
## $`/tmp/RtmpOjIp7w/Rinst12c86afe86ba/ForensicDocument/extdata/fig-O.png`
##             an          bn
## 0 1677.0131403    0.000000
## 1    3.3772831  -22.616375
## 2   80.1557968   86.759999
## 3   -6.6960430 -100.939109
## 4   -1.6572270   -3.109371
## 5    3.2688995  -26.634278
## 6   -0.3168313   -5.260817

As stated above, the result object is a list of length length(files). In this particular case, there is only one input file, thus result is a list of length 1.

## typeof(result) : list
## length(result) == length(files) : TRUE
## names(result) == files : TRUE

Statistical Evaluation

back to top


In forensic science, the evidence \(y\) is usually interpreted through the computation of a likelihood ratio: \[LR = \frac{f(y|H_p)}{f(y|H_d)} \], Where

In the context of handwritten expertise suppose that: (i) an anonymous letter (i.e. the questioned document) is available for comparative analysis, and (ii) written material from a suspect is selected for comparative purposes (i.e. the reference document. For the compuation of the likelihood ratio, we consider the following propositions of interest:

In this tutorial and package, the proposed evaluation two-level likelihood ratio is based on the one developed by @bozza2008. It allows to take into account the within- and between-writer variability. The evidence \(y\), namely the fourier parameters is supposed to follow a multivariate normal density with unknown mean vector and covariance matrix: \[ y \sim \mathcal{M}\mu,W) \], \[ W^{-1} \sim \mathcal{IW}(U,n_w) \], \[ \mu \sim \mathcal{M}(\theta,B) \]. Where \(B\) and \(U\) are the within and between writer covariance matrices.

The likelihood-ratio is comupted using the TwoLeveLR method, and the background parameters (\(\mu\), \(B\) and \(U\)) can be computed using the TwoLeveLR_Background

Parameters

back to top

The parameters for TwoLevelLR method are:

Similarly, parameters for TwoLevelLR_Background are:

Examples

back to top

In the following examples, we will use the characterO dataset. It contains the extracted Fourier (n.fourier = 4) parameters from 554 handwritten character loops, written by 11 writers. It is a subset of the data collected by @marquis2006. For more information on this dataset, see ?characterO.For other applications of this methodologie, see @marquis2011a and @taroni2012.

data(characterO)

In both examples, number of iterations and burn in iterations for the MCMC chain and set to 110 and 10. The inverse Wishart distribution degree of freedom nw=50, as in @bozza2008.

n.iter = 110
n.burnin = 10
nw = 50
Example 1: \(H_p\) true

back to top

We present the case were the questioned and reference documents are written by the same author: writer 1 (writer 1 has a total of 46 characters).

In this example we use the:

# reference & questioned
data_reference = subset(characterO$measurements[,-1], 
                        subset = (characterO$info$writer == 1))[1:23,]
data_questioned = subset(characterO$measurements[,-1], 
                         subset = (characterO$info$writer == 1))[-(1:23),]
# background
subset = characterO$info$writer != 1
data_back = subset(characterO$measurements[,-1], 
                   subset = subset)
background = TwoLevelLR_Background(data_back, 
                                   fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR
## [1] 14.20812

The result object LLR is the numeric value of the log-likelihood ratio: \(LLR = log(f(y|H_p))-log(f(y|H_d))\). Here the \(LLR\) is positive (14.2081206), suggesting that \(H_p\) is true (i.e. the author of the reference document is the author of the questioned document).

Example 2: \(H_d\) true

back to top

Here, we present the case were the questioned and reference documents are written by different authors: writer 1 and writer 2:

# reference & questioned
data_reference = subset(characterO$measurements[,-1], subset = characterO$info$writer == 1)[1:20,]
data_questioned = subset(characterO$measurements[,-1], subset = characterO$info$writer == 2)[1:20,]
subset = characterO$info$writer > 2
# background
data_back = subset(characterO$measurements[,-1], subset = subset)
background = TwoLevelLR_Background(data_back, fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR
## [1] -51.80008

Here the \(LLR\) is negative (-51.8000845), suggesting that \(H_d\) is true (i.e. the author of the reference document is not the author of the questioned document).


References

References cited in this tutorial