Creating and filling the state space

Marcus Wurzer

2018-03-09

Introduction

Doing a population projection using the MSDem-package involves four consecutive steps:

  1. Defining the state space
  2. Filling the empty state space files
  3. Reading the state space files
  4. Setting the simulation parameters and running the projection

This vignette concentrates on the first two of these steps, the other two are explained in the “Running a population projection” vignette. We show how a state space is created using the state.space()-function, which leads to the generation of empty .csv’s in a standardized form. These files then have to be filled by the users and are the input needed to run the population projection (using function msproj()) subsequently.

Besides explaining the arguments of the state.space()-function, two state spaces are created to exemplify the output that is created. In addition, details about filling empty state spaces are given. We assume that readers of this vignette are familiar with some of the R data types, namely vectors, matrices, data frames, and lists.

Creating the state space - a simple example

Bringing the MSDEM-package into being, we quickly realized that when following the ‘easy-to-use’ paradigm, we have to sacrifice some flexibility. In particular, this concerns the structure of the data sets that are passed to msproj(), the function that is used to run the population projection. To bring the data into a standardized form, state.space() has to be utilized. That function has a number of arguments that can be used to change the default settings. Depending on the choices made by the user, running state.space() results in the generation of two or three .csv-files that are saved in a specific input data folder (see below):

  1. The state space file itself
  2. A helper file containing the variable definitions
  3. A file containing the migration information. This file is only created if migration is possible, which means that some spatial information (region and/or residence) must be available.

Let’s have a look at the list of function arguments first:

args(state.space)
## function (period = c(2010, 2100), by = 5, region = NULL, residence = c("rural", 
##     "urban"), sex = c("male", "female"), age = c(0, 100), edu = NULL, 
##     migration = "biregional", mig.var = "mrate", country = "Country", 
##     scen = "SSP2", data.dir = "input_data/") 
## NULL

In the following, each of these arguments is described briefly:

Let’s first create the files for the most basic model that can be run, only taking mandatory variables age and sex into account. Hence, we get a state space without any educational and geographical information (i.e., neither region nor residence is considered). The only argument we have to change is residence. We save the function output in an object called st.sp and then have a look at its structure:

st.sp <- state.space(residence = NULL)
str(st.sp)
## List of 3
##  $ state.space         :'data.frame':    1806 obs. of  5 variables:
##   ..$ period: num [1:1806] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##   ..$ sex   : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ age   : num [1:1806] 0 5 10 15 20 25 30 35 40 45 ...
##   ..$ var   : Factor w/ 10 levels "pop","le0","mx",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ value : logi [1:1806] NA NA NA NA NA NA ...
##  $ variable.definitions: chr [1:44, 1:2] "country" "period" "period" "period" ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "variables" "values"
##  $ edu.trans           : NULL

We see that st.sp is a list having three elements: The first two correspond to the two .csv’s that are later saved into the data folder (we will inspect these below), the third one provides information about the possible education transitions (not saved into the data folder). We will now have a look at each of the three elements in turn.

The state space component

Printing the first few lines of the state space component gives us the following:

head(st.sp$state.space)
##   period  sex age var value
## 1   2010 male   0 pop    NA
## 2   2010 male   5 pop    NA
## 3   2010 male  10 pop    NA
## 4   2010 male  15 pop    NA
## 5   2010 male  20 pop    NA
## 6   2010 male  25 pop    NA

We observe a data frame with five columns: period, sex, age, var, and value. As it is the case for sex and age, period also is mandatory for every model. Every line of the state space represents a certain group/subpopulation, defined by a combination of these three obligatory variables. Column var specifies the pieces of information that have to be provided by the users. Everything on the right-hand side of this column (here: only column value) has to be filled by them. Looking at the first six entries of var, we see that all have the same entry called pop. pop is short for “total population” and means that the user has to enter the total population numbers for each of the subpopulations into column value. At the moment, nothing is filled in, the NA’s stand for missing values (see section “Filling the state space files” below).

Let’s check how many different var entries exist for the given state space:

table(st.sp$state.space$var)
## 
##       pop       le0        mx        ax      asfr      sexr reclasstr       gap 
##        42        36       792       792       126        18         0         0 
##   perural      eapr 
##         0         0

We got

Note that for each of those variables, values only have to be provided if it is meaningful for a certain subpopulation. For example, the total population numbers have to be filled in for every possible age-sex combination, but only for the base year. There are 21 age groups and 2 sexes, thus 42 combinations. In contrast to that, the ASFR has to be provided for every year of the simulation horizon, but only for the meaningful combinations of age and sex (i.e., women aged 15-49), and thus, other subpopulations are not included in the state space in combination with the ASFR. This results in 126 possible combinations (7 age groups of women times 18 periods). Another example can be seen if we have a look at the last few lines of the data frame:

tail(st.sp$state.space)
##      period  sex age  var value
## 1801   2070 <NA>  NA sexr    NA
## 1802   2075 <NA>  NA sexr    NA
## 1803   2080 <NA>  NA sexr    NA
## 1804   2085 <NA>  NA sexr    NA
## 1805   2090 <NA>  NA sexr    NA
## 1806   2095 <NA>  NA sexr    NA

In line no. 1806, the sex ratio of the newborns has to be provided. sexr is a variable independent of any age-sex-period combinations (hence, the NA’s for these variables), and thus, only one value is needed.

Since no reclassification is happening and no education is considered in the current model, no values have to be filled in for reclasstr, gap, perural, and eapr.

The variable definition component

List element two looks like the following:

st.sp$variable.definitions
##       variables values        
##  [1,] "country" "Country"     
##  [2,] "period"  "2010"        
##  [3,] "period"  "2015"        
##  [4,] "period"  "2020"        
##  [5,] "period"  "2025"        
##  [6,] "period"  "2030"        
##  [7,] "period"  "2035"        
##  [8,] "period"  "2040"        
##  [9,] "period"  "2045"        
## [10,] "period"  "2050"        
## [11,] "period"  "2055"        
## [12,] "period"  "2060"        
## [13,] "period"  "2065"        
## [14,] "period"  "2070"        
## [15,] "period"  "2075"        
## [16,] "period"  "2080"        
## [17,] "period"  "2085"        
## [18,] "period"  "2090"        
## [19,] "period"  "2095"        
## [20,] "sex"     "male"        
## [21,] "sex"     "female"      
## [22,] "age"     "0"           
## [23,] "age"     "1"           
## [24,] "age"     "5"           
## [25,] "age"     "10"          
## [26,] "age"     "15"          
## [27,] "age"     "20"          
## [28,] "age"     "25"          
## [29,] "age"     "30"          
## [30,] "age"     "35"          
## [31,] "age"     "40"          
## [32,] "age"     "45"          
## [33,] "age"     "50"          
## [34,] "age"     "55"          
## [35,] "age"     "60"          
## [36,] "age"     "65"          
## [37,] "age"     "70"          
## [38,] "age"     "75"          
## [39,] "age"     "80"          
## [40,] "age"     "85"          
## [41,] "age"     "90"          
## [42,] "age"     "95"          
## [43,] "age"     "100"         
## [44,] "mig"     "no migration"

As said above, it is there to inform users about certain settings of the current model. We see that

The educational transition component

The third list element, called edu.trans, can be used to get information about the education transitions that are possible in the model:

st.sp$edu.trans
## NULL

Per default, education is not considered in the multistate model, and so the transition matrix does not exist here. Note: At the moment, the user can only specify the number of education levels, but not the possible transitions (i.e., from which level to which level can one go?) themselves. This will be changed in a later version of the package.

A more complex state space

We now assume that we want to run a simulation for India that includes all the possible dimensions: Age, Sex, Education, Region, and Residence. Since there are no default regions, we would have to create a vector of the Indian regions first:

reg <- c("IN.AN", "IN.AP", "IN.AR", "IN.AS", "IN.BR", "IN.CH", "IN.CT", "IN.DD", "IN.DL", 
         "IN.DN", "IN.GA", "IN.GJ", "IN.HP", "IN.HR", "IN.JH", "IN.JK", "IN.KA", "IN.KL", 
         "IN.LD", "IN.MH", "IN.ML", "IN.MN", "IN.MP", "IN.MZ", "IN.NL", "IN.OR", "IN.PB", 
         "IN.PY", "IN.RJ", "IN.SK", "IN.TN", "IN.TR", "IN.UP", "IN.UT", "IN.WB")

We used the ISO codes since they provide a standardized form that is not prone to misspellings or different spellings (e.g., like “Dehli”, “dehli”, “NCT of Dehli”, “nct of Dehli” etc.) which may cause problems in the simulation. We create the state space with the following line of code:

st.sp2 <- state.space(region = reg, edu = 6, country = "India", scen = "AGESR_Const")

Apart from the regions, we also specified that we are having six educational levels, that our country is India, and that we want to call our scenario AGESR_Const, which stands for an Age/Gender/Education/State/Residence scenario under constant assumptions (e.g, no changes in fertility patterns, migration rates etc. for the whole simulation horizon). st.sp2 has the following structure:

str(st.sp2)
## List of 4
##  $ state.space         :'data.frame':    11269 obs. of  75 variables:
##   ..$ period     : num [1:11269] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##   ..$ sex        : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ age        : num [1:11269] 0 0 0 0 0 0 5 5 5 5 ...
##   ..$ edu        : Factor w/ 14 levels "e1","e2","e3",..: 1 2 3 4 5 6 1 2 3 4 ...
##   ..$ var        : Factor w/ 10 levels "pop","le0","mx",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ IN.AN_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AP_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AR_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AS_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.BR_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.CH_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.CT_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DD_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DL_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DN_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.GA_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.GJ_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.HP_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.HR_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.JH_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.JK_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.KA_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.KL_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.LD_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MH_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.ML_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MN_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MP_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MZ_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.NL_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.OR_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.PB_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.PY_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.RJ_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.SK_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.TN_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.TR_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.UP_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.UT_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.WB_rural: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AN_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AP_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AR_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.AS_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.BR_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.CH_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.CT_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DD_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DL_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.DN_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.GA_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.GJ_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.HP_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.HR_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.JH_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.JK_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.KA_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.KL_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.LD_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MH_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.ML_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MN_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MP_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.MZ_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.NL_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.OR_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.PB_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.PY_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.RJ_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.SK_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.TN_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.TR_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.UP_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.UT_urban: logi [1:11269] NA NA NA NA NA NA ...
##   ..$ IN.WB_urban: logi [1:11269] NA NA NA NA NA NA ...
##  $ variable.definitions: chr [1:96, 1:2] "country" "period" "period" "period" ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "variables" "values"
##  $ migration           :'data.frame':    70560 obs. of  7 variables:
##   ..$ period     : num [1:70560] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##   ..$ sex        : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ age        : num [1:70560] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ edu        : Factor w/ 6 levels "e1","e2","e3",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ origin     : Factor w/ 72 levels "IN.AN_rural",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ destination: Factor w/ 72 levels "IN.AN_rural",..: 71 71 71 71 71 71 71 71 71 71 ...
##   ..$ mrate      : logi [1:70560] NA NA NA NA NA NA ...
##  $ edu.trans           : num [1:6, 1:6] 0 0 0 0 0 0 1 0 0 0 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "e1" "e2" "e3" "e4" ...
##   .. ..$ : chr [1:6] "e1" "e2" "e3" "e4" ...

Besides the already known state.space, variable.definitions and edu.trans components that differ from the first state space created above, st.sp2 also contains a migration component. Again, we want to have a closer look at all of the list components in turn.

The state space component

As can be seen above, the state space now has a different structure: Variables period, sex, age and edu are still there, but instead of one value column, there now are 70 columns with region/residence combinations (35 regions times 2 types of residence). The reason to choose this wide format of the table is Microsoft Excel’s limitation to approximately 1 million rows it can handle. We suspected that most MSDem-users will use Excel to fill in the values, and thus thought it would be a good choice to have a higher number of columns, but save many rows in return.

For the India example we see that our state space has 11269 rows times 75 columns. If we would have chosen the long format, the number of rows would have been 70 times higher (the region/residence combinations), giving us a value relatively close to the maximum that Excel can handle, namely 788830. If there were some additional regions (or educational levels, or age groups etc.), we would have exceeded this limit.

Using this structure, we can regard each of the region/residence combinations as a separate value column: Instead of filling in just one value for each period/sex/age/edu pattern, we have to fill in 70. As an example, we consider the sex ratio of the newborns again. For the first state space, there was just one line that had to be filled with one value. Now, there is still one line

st.sp2$state.space[st.sp2$state.space$var == "sexr", 1:10]
##       period  sex age  edu  var IN.AN_rural IN.AP_rural IN.AR_rural IN.AS_rural
## 10729   2010 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10730   2015 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10731   2020 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10732   2025 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10733   2030 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10734   2035 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10735   2040 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10736   2045 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10737   2050 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10738   2055 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10739   2060 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10740   2065 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10741   2070 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10742   2075 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10743   2080 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10744   2085 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10745   2090 <NA>  NA <NA> sexr          NA          NA          NA          NA
## 10746   2095 <NA>  NA <NA> sexr          NA          NA          NA          NA
##       IN.BR_rural
## 10729          NA
## 10730          NA
## 10731          NA
## 10732          NA
## 10733          NA
## 10734          NA
## 10735          NA
## 10736          NA
## 10737          NA
## 10738          NA
## 10739          NA
## 10740          NA
## 10741          NA
## 10742          NA
## 10743          NA
## 10744          NA
## 10745          NA
## 10746          NA

but we have to provide 70 different sex ratio values (note that we just printed the first 10 columns here).

The var entries of the state space haven’t changed, but since region, residence and education are considered now, there are no categories that aren’t present in the state space anymore, i.e., users also have to provide values for reclasstr, gap, perural and eapr:

table(st.sp2$state.space$var)
## 
##       pop       le0        mx        ax      asfr      sexr reclasstr       gap 
##       252       216      4752      4752       756        18         1         1 
##   perural      eapr 
##         1       520

What is more, the number of rows that have to be filled have changed, too: For example, there are 216 lines for variable le0 now, the 36 of the first state space (see above) multiplied by the 6 educational levels. From this it follows that the number of le0 values that have to be provided is 420 (= 70 * 6) times higher than in the “simple” state space created above.

The migration component

Like the state.space component, the migration list element also contains period, sex, age, and edu columns to define the population patterns, but there are three additonal columns related to migration:

head(st.sp2$mig)
##   period  sex age edu      origin destination mrate
## 1   2010 male   0  e1 IN.AN_rural       India    NA
## 2   2010 male   0  e1 IN.AP_rural       India    NA
## 3   2010 male   0  e1 IN.AR_rural       India    NA
## 4   2010 male   0  e1 IN.AS_rural       India    NA
## 5   2010 male   0  e1 IN.BR_rural       India    NA
## 6   2010 male   0  e1 IN.CH_rural       India    NA

Since we didn’t change the default value of the migration argument when creating the state space, the migration is assumed to be biregional. Thus, people can only emigrate to the rest of the world or immigrate from the rest of the world, the exact origins of immigrants and destinations of emigrants are not specified.

The variable definition component

The variable definition component now also contains information about region, residence, the consideration of reclassification (here: TRUE), and possible educational transitions. Furthermore, the type of migration has changed from no migration to biregional:

st.sp2$variable.definitions
##       variables   values      
##  [1,] "country"   "India"     
##  [2,] "period"    "2010"      
##  [3,] "period"    "2015"      
##  [4,] "period"    "2020"      
##  [5,] "period"    "2025"      
##  [6,] "period"    "2030"      
##  [7,] "period"    "2035"      
##  [8,] "period"    "2040"      
##  [9,] "period"    "2045"      
## [10,] "period"    "2050"      
## [11,] "period"    "2055"      
## [12,] "period"    "2060"      
## [13,] "period"    "2065"      
## [14,] "period"    "2070"      
## [15,] "period"    "2075"      
## [16,] "period"    "2080"      
## [17,] "period"    "2085"      
## [18,] "period"    "2090"      
## [19,] "period"    "2095"      
## [20,] "region"    "IN.AN"     
## [21,] "region"    "IN.AP"     
## [22,] "region"    "IN.AR"     
## [23,] "region"    "IN.AS"     
## [24,] "region"    "IN.BR"     
## [25,] "region"    "IN.CH"     
## [26,] "region"    "IN.CT"     
## [27,] "region"    "IN.DD"     
## [28,] "region"    "IN.DL"     
## [29,] "region"    "IN.DN"     
## [30,] "region"    "IN.GA"     
## [31,] "region"    "IN.GJ"     
## [32,] "region"    "IN.HP"     
## [33,] "region"    "IN.HR"     
## [34,] "region"    "IN.JH"     
## [35,] "region"    "IN.JK"     
## [36,] "region"    "IN.KA"     
## [37,] "region"    "IN.KL"     
## [38,] "region"    "IN.LD"     
## [39,] "region"    "IN.MH"     
## [40,] "region"    "IN.ML"     
## [41,] "region"    "IN.MN"     
## [42,] "region"    "IN.MP"     
## [43,] "region"    "IN.MZ"     
## [44,] "region"    "IN.NL"     
## [45,] "region"    "IN.OR"     
## [46,] "region"    "IN.PB"     
## [47,] "region"    "IN.PY"     
## [48,] "region"    "IN.RJ"     
## [49,] "region"    "IN.SK"     
## [50,] "region"    "IN.TN"     
## [51,] "region"    "IN.TR"     
## [52,] "region"    "IN.UP"     
## [53,] "region"    "IN.UT"     
## [54,] "region"    "IN.WB"     
## [55,] "residence" "rural"     
## [56,] "residence" "urban"     
## [57,] "sex"       "male"      
## [58,] "sex"       "female"    
## [59,] "age"       "0"         
## [60,] "age"       "1"         
## [61,] "age"       "5"         
## [62,] "age"       "10"        
## [63,] "age"       "15"        
## [64,] "age"       "20"        
## [65,] "age"       "25"        
## [66,] "age"       "30"        
## [67,] "age"       "35"        
## [68,] "age"       "40"        
## [69,] "age"       "45"        
## [70,] "age"       "50"        
## [71,] "age"       "55"        
## [72,] "age"       "60"        
## [73,] "age"       "65"        
## [74,] "age"       "70"        
## [75,] "age"       "75"        
## [76,] "age"       "80"        
## [77,] "age"       "85"        
## [78,] "age"       "90"        
## [79,] "age"       "95"        
## [80,] "age"       "100"       
## [81,] "edu"       "e1"        
## [82,] "edu"       "e2"        
## [83,] "edu"       "e3"        
## [84,] "edu"       "e4"        
## [85,] "edu"       "e5"        
## [86,] "edu"       "e6"        
## [87,] "reclass"   "TRUE"      
## [88,] "edu"       "e12"       
## [89,] "edu"       "e23"       
## [90,] "edu"       "e24"       
## [91,] "edu"       "e34"       
## [92,] "edu"       "e35"       
## [93,] "edu"       "e45"       
## [94,] "edu"       "e46"       
## [95,] "edu"       "e56"       
## [96,] "mig"       "biregional"

The educational transition component

Since education wasn’t considered in our first state space above, we merely got NULL when calling the corresponding list element. Now, with the consideration of 6 educational levels, we can have a look at the possible transitions between them:

st.sp2$edu.trans
##    e1 e2 e3 e4 e5 e6
## e1  0  1  1  0  0  0
## e2  0  0  1  1  0  0
## e3  0  0  0  1  1  0
## e4  0  0  0  0  1  1
## e5  0  0  0  0  0  1
## e6  0  0  0  0  0  0

The rows indicate the “from” (= current), the columns the “to” (= attainable) educational levels. A 0 entry means that a transition is not possible (e.g., directly going from e1 to e6), 1 stands for possible transitions within a five-year period of the simulation. We see, e.g., that

The main diagonal ( e1 to e1, e2 to e2) is not of interest here, since remaining at the same level is not considered as a “transition”.

Filling the state space files

The “empty” files (meaning that they only contain NA’s) that have been generated are written into a subfolder of the current working directory using R’s write.table() function. The files with endings state_space.csv and mig.csv have to be filled by the user, but not the file having the ending var_def.csv since it is only there to provide some useful information about the current state space.

For our second state space, the files in the “input_data” subfolder of our working directory are:

Files created by the state space function

Files created by the state space function

We see the naming convention of the files: Country followed by scenario, followed by the information contained in the file, with the parts separated by underscores. This convention is utilized later when we run the simulation. Opening one of the files using a spreadsheet program like Microsoft Excel, we see that the structure of the corresponding list element (see above) was maintained:

Structure of the empty migration file

Structure of the empty migration file

It doesn’t matter if the files are filled with the required information using R or some spreadsheet program or editor, but it is important that the simulation itself is contingent upon the files exactly as they have been created by state.space(), i.e., the user should not change anything, not the name of the files, not the structure within the files! Thus, the empty files that were created should be overwritten once the data have been entered.