cprob {PST} | R Documentation |
L
Compute the empirical conditional probability distributions of order L from a set of sequences
## S4 method for signature 'stslist' cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE, weighted=TRUE, with.missing=FALSE, to.list=FALSE)
object |
a sequence object, that is an object of class stslist as created by TraMineR |
L |
integer. Context length. |
cdata |
under development |
context |
character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed. |
stationary |
logical. If |
nmin |
integer. Minimal frequency of a context. See details. |
prob |
logical. If |
weighted |
logical. If |
with.missing |
logical. If |
to.list |
logical. If |
The empirical conditional probability \hat{P}(σ | c) of observing a symbol σ \in A after the subsequence c=c_{1}, …, c_{k} of length k=L is computed as
\hat{P}(σ | c) = \frac{N(cσ)}{∑_{α \in A} N(cα)}
where
N(c)=∑_{i=1}^{\ell} 1 ≤ft[x_{i}, …, x_{i+|c|-1}=c \right], \; x=x_{1}, …, x_{\ell}, \; c=c_{1}, …, c_{k}
is the number of occurrences of the subsequence c in the sequence x and cσ is the concatenation of the subsequence c and the symbol σ.
Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 … m, the function N(c) is replaced by
N(c)=∑_{j=1}^{m} w^{j} ∑_{i=1}^{\ell} 1 ≤ft[x_{i}^{j}, …, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, …, c_{k}
where x^{j}=x_{1}^{j}, …, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.
If stationary=TRUE
a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object
. If stationary=FALSE
a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.
Alexis Gabadinho
Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.
## Example with the single sequence s1 data(s1) s1 <- seqdef(s1) cprob(s1, L=0, prob=FALSE) cprob(s1, L=1, prob=TRUE) ## Preparing a sequence object with the SRH data set data(SRH) state.list <- levels(SRH$p99c01) ## sequential color palette mycol5 <- rev(brewer.pal(5, "RdYlGn")) SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"), labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5) names(SRH.seq) <- 1999:2009 ## Example 1: 0th order: weighted and unweigthed counts cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE) cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE) ## Example 2: 2th order: weighted and unweigthed probability distrib. cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE) cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)