Doing a population projection using the MSDem
-package involves four consecutive steps:
This vignette concentrates on the first two of these steps, the other two are explained in the “Running a population projection” vignette. We show how a state space is created using the state.space()
-function, which leads to the generation of empty .csv’s in a standardized form. These files then have to be filled by the users and are the input needed to run the population projection (using function msproj()
) subsequently.
Besides explaining the arguments of the state.space()
-function, two state spaces are created to exemplify the output that is created. In addition, details about filling empty state spaces are given. We assume that readers of this vignette are familiar with some of the R data types, namely vectors, matrices, data frames, and lists.
Bringing the MSDEM
-package into being, we quickly realized that when following the ‘easy-to-use’ paradigm, we have to sacrifice some flexibility. In particular, this concerns the structure of the data sets that are passed to msproj()
, the function that is used to run the population projection. To bring the data into a standardized form, state.space()
has to be utilized. That function has a number of arguments that can be used to change the default settings. Depending on the choices made by the user, running state.space()
results in the generation of two or three .csv-files that are saved in a specific input data folder (see below):
Let’s have a look at the list of function arguments first:
args(state.space)
## function (period = c(2010, 2100), by = 5, region = NULL, residence = c("rural",
## "urban"), sex = c("male", "female"), age = c(0, 100), edu = NULL,
## migration = "biregional", mig.var = "mrate", country = "Country",
## scen = "SSP2", data.dir = "input_data/")
## NULL
In the following, each of these arguments is described briefly:
period
a vector with two values used to define the time horizon of the simulation, i.e., start year and end year. By default, simulations start in 2010 and end in 2100.by
a number specifying the time increment for the simulation, and consequently, the definition of the age groups. Defaults to 5, i.e., period lengths of five years and five-year age groups. May also be set to 1, but please note that msproj()
doesn’t provide the functionality to deal with period lengths of one year yet.region
an optional vector of region names that has to be entered by the users if they want to do a sub-national projection (region here stands for any administrative level below the national one, this could be regions, states, districts, municipalities etc.). Should be NULL
(if no regions are used) or a character vector.residence
an optional vector with the two values urban
and rural
used to specify the distinction between urban and rural areas in the model. Should be NULL
or the default character vector.sex
a vector containing the two values male
and female
. The usage of sex
is mandatory in any of the models, and thus, it is part of every state space. It is only possible to change the labels (e.g., to m
and f
).age
a vector containing two values: Minimum age and maximum age of the population. Defaults to 0 years and 100 years, respectively. The width of each age category is taken from the by
argument and thus, defaults to 5. In keeping with the structure of standard life tables, two age groups (0 and 1-4) are created internally instead of using one 0-4 age group if five-year age groups are chosen. Just like sex
, it is mandatory to use this variable.edu
an optional vector specifying the number of educational levels to be used in the simulation. Should be NULL
(the default) if education is not used in the simulation or a single number.migration
the type of internal migration used in the model. Should be either biregional
(the default) or bilateral
. Biregional means that for a given geographical unit, it is not of interest where the migrants come from or where they go to, i.e., there is only one outgoing (to the rest of the country) and one incoming stream (from the rest of the country). When using bilateral migration, the exact origins and destinations of the flows are specified. Migration only makes sense if either region
or residence
or both are used in the model. If this is not the case, no file containing the migration information will be produced (see above). Note: International migration is currently not included in the model.mig.var
the type of migration data used in the state space. Should be either mrate
if migration rates are to be used or mabs
in the case of absolute migration numbers. Note: At the moment, the model doesn’t provide the functionality to work with absolute numbers.country
the name of the country the simulation is ran for. This information is used in the output generation of the model (e.g., to annotate plots and tables) and to create file names. If no value is provided by the user, the default (World
) is used.scen
the name of the scenario that is ran. Just like country
, this information is used to create meaningful file names.data.dir
the name of the directory the files are written to. By default, a subfolder called “input_data” is created automatically within the current working directory, and the files are then saved into this folder. If users want to use another directory, they have to specify the whole path instead.Let’s first create the files for the most basic model that can be run, only taking mandatory variables age
and sex
into account. Hence, we get a state space without any educational and geographical information (i.e., neither region nor residence is considered). The only argument we have to change is residence
. We save the function output in an object called st.sp
and then have a look at its structure:
st.sp <- state.space(residence = NULL)
str(st.sp)
## List of 3
## $ state.space :'data.frame': 1806 obs. of 5 variables:
## ..$ period: num [1:1806] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## ..$ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ age : num [1:1806] 0 5 10 15 20 25 30 35 40 45 ...
## ..$ var : Factor w/ 10 levels "pop","le0","mx",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ value : logi [1:1806] NA NA NA NA NA NA ...
## $ variable.definitions: chr [1:44, 1:2] "country" "period" "period" "period" ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "variables" "values"
## $ edu.trans : NULL
We see that st.sp
is a list having three elements: The first two correspond to the two .csv’s that are later saved into the data folder (we will inspect these below), the third one provides information about the possible education transitions (not saved into the data folder). We will now have a look at each of the three elements in turn.
Printing the first few lines of the state space component gives us the following:
head(st.sp$state.space)
## period sex age var value
## 1 2010 male 0 pop NA
## 2 2010 male 5 pop NA
## 3 2010 male 10 pop NA
## 4 2010 male 15 pop NA
## 5 2010 male 20 pop NA
## 6 2010 male 25 pop NA
We observe a data frame with five columns: period
, sex
, age
, var
, and value
. As it is the case for sex
and age
, period
also is mandatory for every model. Every line of the state space represents a certain group/subpopulation, defined by a combination of these three obligatory variables. Column var
specifies the pieces of information that have to be provided by the users. Everything on the right-hand side of this column (here: only column value
) has to be filled by them. Looking at the first six entries of var
, we see that all have the same entry called pop
. pop
is short for “total population” and means that the user has to enter the total population numbers for each of the subpopulations into column value
. At the moment, nothing is filled in, the NA’s stand for missing values (see section “Filling the state space files” below).
Let’s check how many different var
entries exist for the given state space:
table(st.sp$state.space$var)
##
## pop le0 mx ax asfr sexr reclasstr gap
## 42 36 792 792 126 18 0 0
## perural eapr
## 0 0
We got
pop
total populationle0
life expectancy at birthmx
death rate in the cohort between ages x and x + nax
average number of person-years lived in the interval by those dying in the intervalasfr
age-specific fertility rate (ASFR)sexr
sex ratio at birth (female by male)reclasstr
proportion of population reclassified from rural to urbangap
difference between a region’s reclassification rate and the average rate (i.e., the country rate)perural
percentage of the population living in rural areaseapr
educational attainment progression ratioNote that for each of those variables, values only have to be provided if it is meaningful for a certain subpopulation. For example, the total population numbers have to be filled in for every possible age-sex combination, but only for the base year. There are 21 age groups and 2 sexes, thus 42 combinations. In contrast to that, the ASFR has to be provided for every year of the simulation horizon, but only for the meaningful combinations of age and sex (i.e., women aged 15-49), and thus, other subpopulations are not included in the state space in combination with the ASFR. This results in 126 possible combinations (7 age groups of women times 18 periods). Another example can be seen if we have a look at the last few lines of the data frame:
tail(st.sp$state.space)
## period sex age var value
## 1801 2070 <NA> NA sexr NA
## 1802 2075 <NA> NA sexr NA
## 1803 2080 <NA> NA sexr NA
## 1804 2085 <NA> NA sexr NA
## 1805 2090 <NA> NA sexr NA
## 1806 2095 <NA> NA sexr NA
In line no. 1806, the sex ratio of the newborns has to be provided. sexr
is a variable independent of any age-sex-period combinations (hence, the NA’s for these variables), and thus, only one value is needed.
Since no reclassification is happening and no education is considered in the current model, no values have to be filled in for reclasstr
, gap
, perural
, and eapr
.
List element two looks like the following:
st.sp$variable.definitions
## variables values
## [1,] "country" "Country"
## [2,] "period" "2010"
## [3,] "period" "2015"
## [4,] "period" "2020"
## [5,] "period" "2025"
## [6,] "period" "2030"
## [7,] "period" "2035"
## [8,] "period" "2040"
## [9,] "period" "2045"
## [10,] "period" "2050"
## [11,] "period" "2055"
## [12,] "period" "2060"
## [13,] "period" "2065"
## [14,] "period" "2070"
## [15,] "period" "2075"
## [16,] "period" "2080"
## [17,] "period" "2085"
## [18,] "period" "2090"
## [19,] "period" "2095"
## [20,] "sex" "male"
## [21,] "sex" "female"
## [22,] "age" "0"
## [23,] "age" "1"
## [24,] "age" "5"
## [25,] "age" "10"
## [26,] "age" "15"
## [27,] "age" "20"
## [28,] "age" "25"
## [29,] "age" "30"
## [30,] "age" "35"
## [31,] "age" "40"
## [32,] "age" "45"
## [33,] "age" "50"
## [34,] "age" "55"
## [35,] "age" "60"
## [36,] "age" "65"
## [37,] "age" "70"
## [38,] "age" "75"
## [39,] "age" "80"
## [40,] "age" "85"
## [41,] "age" "90"
## [42,] "age" "95"
## [43,] "age" "100"
## [44,] "mig" "no migration"
As said above, it is there to inform users about certain settings of the current model. We see that
The third list element, called edu.trans
, can be used to get information about the education transitions that are possible in the model:
st.sp$edu.trans
## NULL
Per default, education is not considered in the multistate model, and so the transition matrix does not exist here. Note: At the moment, the user can only specify the number of education levels, but not the possible transitions (i.e., from which level to which level can one go?) themselves. This will be changed in a later version of the package.
We now assume that we want to run a simulation for India that includes all the possible dimensions: Age, Sex, Education, Region, and Residence. Since there are no default regions, we would have to create a vector of the Indian regions first:
reg <- c("IN.AN", "IN.AP", "IN.AR", "IN.AS", "IN.BR", "IN.CH", "IN.CT", "IN.DD", "IN.DL",
"IN.DN", "IN.GA", "IN.GJ", "IN.HP", "IN.HR", "IN.JH", "IN.JK", "IN.KA", "IN.KL",
"IN.LD", "IN.MH", "IN.ML", "IN.MN", "IN.MP", "IN.MZ", "IN.NL", "IN.OR", "IN.PB",
"IN.PY", "IN.RJ", "IN.SK", "IN.TN", "IN.TR", "IN.UP", "IN.UT", "IN.WB")
We used the ISO codes since they provide a standardized form that is not prone to misspellings or different spellings (e.g., like “Dehli”, “dehli”, “NCT of Dehli”, “nct of Dehli” etc.) which may cause problems in the simulation. We create the state space with the following line of code:
st.sp2 <- state.space(region = reg, edu = 6, country = "India", scen = "AGESR_Const")
Apart from the regions, we also specified that we are having six educational levels, that our country is India, and that we want to call our scenario AGESR_Const, which stands for an Age/Gender/Education/State/Residence scenario under constant assumptions (e.g, no changes in fertility patterns, migration rates etc. for the whole simulation horizon). st.sp2
has the following structure:
str(st.sp2)
## List of 4
## $ state.space :'data.frame': 11269 obs. of 75 variables:
## ..$ period : num [1:11269] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## ..$ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ age : num [1:11269] 0 0 0 0 0 0 5 5 5 5 ...
## ..$ edu : Factor w/ 14 levels "e1","e2","e3",..: 1 2 3 4 5 6 1 2 3 4 ...
## ..$ var : Factor w/ 10 levels "pop","le0","mx",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ IN.AN_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AP_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AR_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AS_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.BR_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.CH_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.CT_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DD_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DL_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DN_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.GA_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.GJ_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.HP_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.HR_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.JH_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.JK_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.KA_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.KL_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.LD_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MH_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.ML_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MN_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MP_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MZ_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.NL_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.OR_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.PB_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.PY_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.RJ_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.SK_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.TN_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.TR_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.UP_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.UT_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.WB_rural: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AN_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AP_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AR_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.AS_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.BR_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.CH_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.CT_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DD_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DL_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.DN_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.GA_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.GJ_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.HP_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.HR_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.JH_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.JK_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.KA_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.KL_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.LD_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MH_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.ML_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MN_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MP_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.MZ_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.NL_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.OR_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.PB_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.PY_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.RJ_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.SK_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.TN_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.TR_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.UP_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.UT_urban: logi [1:11269] NA NA NA NA NA NA ...
## ..$ IN.WB_urban: logi [1:11269] NA NA NA NA NA NA ...
## $ variable.definitions: chr [1:96, 1:2] "country" "period" "period" "period" ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "variables" "values"
## $ migration :'data.frame': 70560 obs. of 7 variables:
## ..$ period : num [1:70560] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## ..$ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ age : num [1:70560] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ edu : Factor w/ 6 levels "e1","e2","e3",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..$ origin : Factor w/ 72 levels "IN.AN_rural",..: 1 2 3 4 5 6 7 8 9 10 ...
## ..$ destination: Factor w/ 72 levels "IN.AN_rural",..: 71 71 71 71 71 71 71 71 71 71 ...
## ..$ mrate : logi [1:70560] NA NA NA NA NA NA ...
## $ edu.trans : num [1:6, 1:6] 0 0 0 0 0 0 1 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:6] "e1" "e2" "e3" "e4" ...
## .. ..$ : chr [1:6] "e1" "e2" "e3" "e4" ...
Besides the already known state.space
, variable.definitions
and edu.trans
components that differ from the first state space created above, st.sp2
also contains a migration
component. Again, we want to have a closer look at all of the list components in turn.
As can be seen above, the state space now has a different structure: Variables period
, sex
, age
and edu
are still there, but instead of one value
column, there now are 70 columns with region/residence combinations (35 regions times 2 types of residence). The reason to choose this wide format of the table is Microsoft Excel’s limitation to approximately 1 million rows it can handle. We suspected that most MSDem
-users will use Excel to fill in the values, and thus thought it would be a good choice to have a higher number of columns, but save many rows in return.
For the India example we see that our state space has 11269 rows times 75 columns. If we would have chosen the long format, the number of rows would have been 70 times higher (the region/residence combinations), giving us a value relatively close to the maximum that Excel can handle, namely 788830. If there were some additional regions (or educational levels, or age groups etc.), we would have exceeded this limit.
Using this structure, we can regard each of the region/residence combinations as a separate value
column: Instead of filling in just one value for each period/sex/age/edu pattern, we have to fill in 70. As an example, we consider the sex ratio of the newborns again. For the first state space, there was just one line that had to be filled with one value. Now, there is still one line
st.sp2$state.space[st.sp2$state.space$var == "sexr", 1:10]
## period sex age edu var IN.AN_rural IN.AP_rural IN.AR_rural IN.AS_rural
## 10729 2010 <NA> NA <NA> sexr NA NA NA NA
## 10730 2015 <NA> NA <NA> sexr NA NA NA NA
## 10731 2020 <NA> NA <NA> sexr NA NA NA NA
## 10732 2025 <NA> NA <NA> sexr NA NA NA NA
## 10733 2030 <NA> NA <NA> sexr NA NA NA NA
## 10734 2035 <NA> NA <NA> sexr NA NA NA NA
## 10735 2040 <NA> NA <NA> sexr NA NA NA NA
## 10736 2045 <NA> NA <NA> sexr NA NA NA NA
## 10737 2050 <NA> NA <NA> sexr NA NA NA NA
## 10738 2055 <NA> NA <NA> sexr NA NA NA NA
## 10739 2060 <NA> NA <NA> sexr NA NA NA NA
## 10740 2065 <NA> NA <NA> sexr NA NA NA NA
## 10741 2070 <NA> NA <NA> sexr NA NA NA NA
## 10742 2075 <NA> NA <NA> sexr NA NA NA NA
## 10743 2080 <NA> NA <NA> sexr NA NA NA NA
## 10744 2085 <NA> NA <NA> sexr NA NA NA NA
## 10745 2090 <NA> NA <NA> sexr NA NA NA NA
## 10746 2095 <NA> NA <NA> sexr NA NA NA NA
## IN.BR_rural
## 10729 NA
## 10730 NA
## 10731 NA
## 10732 NA
## 10733 NA
## 10734 NA
## 10735 NA
## 10736 NA
## 10737 NA
## 10738 NA
## 10739 NA
## 10740 NA
## 10741 NA
## 10742 NA
## 10743 NA
## 10744 NA
## 10745 NA
## 10746 NA
but we have to provide 70 different sex ratio values (note that we just printed the first 10 columns here).
The var
entries of the state space haven’t changed, but since region, residence and education are considered now, there are no categories that aren’t present in the state space anymore, i.e., users also have to provide values for reclasstr
, gap
, perural
and eapr
:
table(st.sp2$state.space$var)
##
## pop le0 mx ax asfr sexr reclasstr gap
## 252 216 4752 4752 756 18 1 1
## perural eapr
## 1 520
What is more, the number of rows that have to be filled have changed, too: For example, there are 216 lines for variable le0
now, the 36 of the first state space (see above) multiplied by the 6 educational levels. From this it follows that the number of le0
values that have to be provided is 420 (= 70 * 6) times higher than in the “simple” state space created above.
Like the state.space
component, the migration
list element also contains period
, sex
, age
, and edu
columns to define the population patterns, but there are three additonal columns related to migration:
head(st.sp2$mig)
## period sex age edu origin destination mrate
## 1 2010 male 0 e1 IN.AN_rural India NA
## 2 2010 male 0 e1 IN.AP_rural India NA
## 3 2010 male 0 e1 IN.AR_rural India NA
## 4 2010 male 0 e1 IN.AS_rural India NA
## 5 2010 male 0 e1 IN.BR_rural India NA
## 6 2010 male 0 e1 IN.CH_rural India NA
origin
the geographical unit of origin. This may be the rest of the world/country or a subnational unit, i.e., a region, a place of residence, or a combination of these, depending on the type of migration (bireginal or bilateral, see above) and the consideration of region and/or residence.destination
the geographical unit of destination. The same rules as for origin
apply.mrate
the migration rate, i.e., the number of people emigrating from or immigrating to a certain geographical unit, given as fraction of the geographical unit’s total population.Since we didn’t change the default value of the migration
argument when creating the state space, the migration is assumed to be biregional. Thus, people can only emigrate to the rest of the world or immigrate from the rest of the world, the exact origins of immigrants and destinations of emigrants are not specified.
The variable definition component now also contains information about region, residence, the consideration of reclassification (here: TRUE
), and possible educational transitions. Furthermore, the type of migration has changed from no migration
to biregional
:
st.sp2$variable.definitions
## variables values
## [1,] "country" "India"
## [2,] "period" "2010"
## [3,] "period" "2015"
## [4,] "period" "2020"
## [5,] "period" "2025"
## [6,] "period" "2030"
## [7,] "period" "2035"
## [8,] "period" "2040"
## [9,] "period" "2045"
## [10,] "period" "2050"
## [11,] "period" "2055"
## [12,] "period" "2060"
## [13,] "period" "2065"
## [14,] "period" "2070"
## [15,] "period" "2075"
## [16,] "period" "2080"
## [17,] "period" "2085"
## [18,] "period" "2090"
## [19,] "period" "2095"
## [20,] "region" "IN.AN"
## [21,] "region" "IN.AP"
## [22,] "region" "IN.AR"
## [23,] "region" "IN.AS"
## [24,] "region" "IN.BR"
## [25,] "region" "IN.CH"
## [26,] "region" "IN.CT"
## [27,] "region" "IN.DD"
## [28,] "region" "IN.DL"
## [29,] "region" "IN.DN"
## [30,] "region" "IN.GA"
## [31,] "region" "IN.GJ"
## [32,] "region" "IN.HP"
## [33,] "region" "IN.HR"
## [34,] "region" "IN.JH"
## [35,] "region" "IN.JK"
## [36,] "region" "IN.KA"
## [37,] "region" "IN.KL"
## [38,] "region" "IN.LD"
## [39,] "region" "IN.MH"
## [40,] "region" "IN.ML"
## [41,] "region" "IN.MN"
## [42,] "region" "IN.MP"
## [43,] "region" "IN.MZ"
## [44,] "region" "IN.NL"
## [45,] "region" "IN.OR"
## [46,] "region" "IN.PB"
## [47,] "region" "IN.PY"
## [48,] "region" "IN.RJ"
## [49,] "region" "IN.SK"
## [50,] "region" "IN.TN"
## [51,] "region" "IN.TR"
## [52,] "region" "IN.UP"
## [53,] "region" "IN.UT"
## [54,] "region" "IN.WB"
## [55,] "residence" "rural"
## [56,] "residence" "urban"
## [57,] "sex" "male"
## [58,] "sex" "female"
## [59,] "age" "0"
## [60,] "age" "1"
## [61,] "age" "5"
## [62,] "age" "10"
## [63,] "age" "15"
## [64,] "age" "20"
## [65,] "age" "25"
## [66,] "age" "30"
## [67,] "age" "35"
## [68,] "age" "40"
## [69,] "age" "45"
## [70,] "age" "50"
## [71,] "age" "55"
## [72,] "age" "60"
## [73,] "age" "65"
## [74,] "age" "70"
## [75,] "age" "75"
## [76,] "age" "80"
## [77,] "age" "85"
## [78,] "age" "90"
## [79,] "age" "95"
## [80,] "age" "100"
## [81,] "edu" "e1"
## [82,] "edu" "e2"
## [83,] "edu" "e3"
## [84,] "edu" "e4"
## [85,] "edu" "e5"
## [86,] "edu" "e6"
## [87,] "reclass" "TRUE"
## [88,] "edu" "e12"
## [89,] "edu" "e23"
## [90,] "edu" "e24"
## [91,] "edu" "e34"
## [92,] "edu" "e35"
## [93,] "edu" "e45"
## [94,] "edu" "e46"
## [95,] "edu" "e56"
## [96,] "mig" "biregional"
Since education wasn’t considered in our first state space above, we merely got NULL
when calling the corresponding list element. Now, with the consideration of 6 educational levels, we can have a look at the possible transitions between them:
st.sp2$edu.trans
## e1 e2 e3 e4 e5 e6
## e1 0 1 1 0 0 0
## e2 0 0 1 1 0 0
## e3 0 0 0 1 1 0
## e4 0 0 0 0 1 1
## e5 0 0 0 0 0 1
## e6 0 0 0 0 0 0
The rows indicate the “from” (= current), the columns the “to” (= attainable) educational levels. A 0
entry means that a transition is not possible (e.g., directly going from e1
to e6
), 1
stands for possible transitions within a five-year period of the simulation. We see, e.g., that
e1
, people can go to e2
or e3
, but not to e4
, e5
or e6
directly)e2
to e1
)The main diagonal ( e1
to e1
, e2
to e2
) is not of interest here, since remaining at the same level is not considered as a “transition”.
The “empty” files (meaning that they only contain NA’s) that have been generated are written into a subfolder of the current working directory using R’s write.table()
function. The files with endings state_space.csv and mig.csv have to be filled by the user, but not the file having the ending var_def.csv since it is only there to provide some useful information about the current state space.
For our second state space, the files in the “input_data” subfolder of our working directory are:
Files created by the state space function
We see the naming convention of the files: Country followed by scenario, followed by the information contained in the file, with the parts separated by underscores. This convention is utilized later when we run the simulation. Opening one of the files using a spreadsheet program like Microsoft Excel, we see that the structure of the corresponding list element (see above) was maintained:
Structure of the empty migration file
It doesn’t matter if the files are filled with the required information using R or some spreadsheet program or editor, but it is important that the simulation itself is contingent upon the files exactly as they have been created by state.space()
, i.e., the user should not change anything, not the name of the files, not the structure within the files! Thus, the empty files that were created should be overwritten once the data have been entered.