---
title: "Expected kin counts by type of relative: A matrix implementation"
output:
html_document:
toc: true
toc_depth: 1
vignette: >
---
```{r, include=FALSE}
#devtools::load_all()
# install.packages("devtools")
#devtools::install_github("IvanWilli/DemoKin")
```
In this vignette, we'll demonstrate how `DemoKin` can be used to compute kinship networks for an average member of a given (female) population. Let us call her Focal: an average Swedish woman who has always lived in Sweden and whose family has never left the country.
Here, we'll show how `DemoKin` can be used to compute the number and age distribution of Focal's relatives under a range of assumptions, including living and deceased kin.
## 1. Kin counts with time-invariant rates
First, we compute kin counts in a **time-invariant** framework. We assume that Focal and all of her relatives experience the 2015 mortality and fertility rates throughout their entire lives (Caswell, 2019). The `DemoKin` package includes data from Sweden as an example: age-by-year matrices of survival probabilities (*swe_px*), survival ratios (*swe_Sx*), fertility rates (*swe_asfr*), and population numbers (*swe_pop*). You can see the data contained in `DemoKin` with `data(package="DemoKin")`. This data comes from the [Human Mortality Database](https://www.mortality.org/) and [Human Fertility Database](https://www.humanfertility.org/) (see `?DemoKin::get_HMDHFD`).
In order to implement the time-invariant models, the function `DemoKin::kin` expects a vector of sruvival ratios and another vector of fertility rates. In this example, we get the data for the year 2015, and run the matrix models:
```{r, message=FALSE, warning=FALSE}
library(DemoKin)
library(tidyverse)
library(knitr)
# Input data structure
# px = 1- qx, where qx = The probability that a person at exact age x will die within one year
View(swe_px)
View(swe_asfr)
# First, get vectors for a given year
swe_surv_2015 <- swe_px[,"2015"]
swe_asfr_2015 <- swe_asfr[,"2015"]
plot(swe_surv_2015)
plot(swe_asfr_2015)
sum(swe_asfr_2015) # TFR in 2015
# Time-invariant model assume that Focal and all of her relatives experience the 2015 mortality and fertility rates throughout their entire lives (Caswell, 2019).
# Run kinship models
swe_2015 <- kin(U = swe_surv_2015,
f = swe_asfr_2015,
time_invariant = TRUE)
#alternatively
swe_2015 <- kin(U = swe_px,
f = swe_asfr,
output_period = 2015,
time_invariant = TRUE)
?kin
# U: A vector (atomic) or matrix with probabilities (or survival ratios, or transition between age class in a more general perspective) with rows as ages (and columns as years in case of matrix, being the name of each col the year).
# Cannot use data.frame!
# f: same as U but for fertility
# default setting
# birth_female = 1/2.04; use birth_female = 1 if asfr input is one-sex
# pi: Mother´s age distribution of childbearing. Same as U but for childbearing distribution (sum to 1). Optional.
# If pi was not being specified, the kin function calculate it for you using the stable population assumption
```
### 1.1. Value (outputs)
`DemoKin::kin()` returns a list containing two data frames: `kin_full` and `kin_summary`.
`kin_full` contains expected kin counts by year (or cohort), age of Focal and age of kin. Note that the columns `year` and `cohort` are empty if the argument is `time_invariant = TRUE` in `kin` (as in this example).
```{r}
head(swe_2015$kin_full)
View(swe_2015$kin_full)
```
`kin_summary` is a ‘summary’ data frame derived from `kin_full`.
```{r}
# using focal's daughter as an example
# for an exact age of focal, there is a corresponding expected age distribution of kin
swe_2015_daughter <- swe_2015$kin_full %>%
arrange(age_focal,kin,age_kin) %>%
filter(kin=="d") %>%
rename_kin()
View(swe_2015_daughter)
# age distribution of daughter at focal's age of 30
# the area under the curve represents the expected number of daughter focal would have at age of 30
swe_2015_daughter %>%
filter(age_focal == 30) %>%
ggplot(aes(age_kin,living)) +
geom_line()+
labs(title = "age_focal = 30")
# age distribution of daughter at focal's age of 30
swe_2015_daughter %>%
filter(age_focal == 50) %>%
ggplot(aes(age_kin,living)) +
geom_line()+
labs(title = "age_focal = 50")
# total number of *daughter* by age of focal (population average)
plot(swe_2015_daughter %>%
group_by(age_focal) %>%
summarise(sum_daughter = sum(living)))
# kin_summary do this for us
# instead of showing age distribution of kin, it gives the count, mean, and standard deviation
head(swe_2015$kin_summary)
View(swe_2015$kin_summary)
```
To produce it, we sum over all ages of kin to produce a data frame of expected kin counts by year or cohort and age of Focal (but not by age of kin).
Consider this simplified example for living kin counts:
```{r}
# This is an example showing you how to get "kin_summary" from "kin_full"
kin_summary_example <-
swe_2015$kin_full %>%
select(year, cohort, kin, age_focal, age_kin, living, dead) %>%
group_by(year, cohort, kin, age_focal) %>%
summarise(count_living = sum(living))
head(kin_summary_example)
head(swe_2015$kin_summary)
```
### 1.2. Visualizing the distribution of kin
Let us now visualize the distribution of relatives over Focal's lifecourse using the summary data.frame `kin_summary`:
```{r, fig.height=6, fig.width=8}
swe_2015[["kin_summary"]] %>%
#filter(kin == "d" | kin == "m" | kin == "os") %>%
rename_kin() %>%
ggplot() +
geom_line(aes(age_focal, count_living)) +
theme_bw() +
labs(y = "Expected number of living relatives") +
facet_wrap(~kin)+
theme(strip.text.x = element_text(size = 20),
axis.text.x=element_text(size=20),
axis.text.y=element_text(size=20),
text=element_text(size = 20))
```
Here, each relative type is identified by a unique code. Note that `DemoKin` uses different codes than Caswell (2019); the equivalence between the two set of codes is given in the following table:
```{r, fig.height=6, fig.width=8, echo=FALSE}
library(knitr)
DemoKin::demokin_codes() %>%
kable
```
We can also visualize the age distribution of relatives when Focal is 35 years old (now, with full names to identify each relative type using the function `DemoKin::rename_kin()`):
```{r, fig.height=6, fig.width=8}
swe_2015[["kin_full"]] %>%
DemoKin::rename_kin() %>%
filter(age_focal == 35) %>%
ggplot() +
geom_line(aes(age_kin, living)) +
geom_vline(xintercept = 35, color=2) +
labs(y = "Expected number of living relatives") +
theme_bw() +
facet_wrap(~kin) +
theme(text=element_text(size=20))
```
The one-sex model implemented in `DemoKin` assumes that the given fertility input applies to both sexes.
Note that, if using survival rates ($S_x$) instead of probabilities ($p_x$), fertility vectors should account for female person-year exposure, using: $(\frac{f_x+f_{x+1}S_x}{2})\frac{L_0}{l_0}$ instead of only $fx$ (see Preston et.al, 2002).
The `kin` function also includes a summary output with the count of living kin, mean and standard deviation of kin age, by type of kin, for each Focal's age:
```{r, fig.height=6, fig.width=8}
swe_2015[["kin_summary"]] %>%
DemoKin::rename_kin() %>%
filter(age_focal == 35) %>%
select(kin, count_living, mean_age, sd_age) %>%
mutate_if(is.numeric, round, 2) %>%
kable()
```
Finally, we can visualize the estimated kin counts by type of kin using a network diagram. Following with the age 35:
```{r, fig.height=6, fig.width=8, dpi=900, message=FALSE, warning=FALSE}
swe_2015[["kin_summary"]] %>%
filter(age_focal == 35) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)
```
## 2. Kin counts with time-variant rates
The demography of Sweden is, in reality, changing every year. This means that Focal and her relatives will have experienced changing mortality and fertility rates over time.
We account for this, by using the time-variant models introduced by Caswell and Song (2021).
Let's take a look at the resulting kin counts for a Focal born in 1960, limiting the output to the relative types given in the argument `output_kin`:
```{r, fig.height=6, fig.width=8}
swe_time_varying <-
kin(
U = swe_px,
f = swe_asfr,
#N = swe_pop,
time_invariant =FALSE,
output_cohort = c(1960,1980)
output_kin = c("d","gd","ggd","m","gm","ggm")
)
swe_time_varying$kin_summary %>%
DemoKin::rename_kin() %>%
ggplot(aes(age_focal,count_living,color=factor(cohort))) +
scale_y_continuous(name = "",labels = seq(0,3,.2),breaks = seq(0,3,.2))+
geom_line(color = 1)+
geom_vline(xintercept = 35, color=2)+
labs(y = "Expected number of living relatives") +
facet_wrap(~kin,scales = "free")+
theme_bw()
```
## 3. Kin deaths
Kin loss can have severe consequences for bereaved relatives. It can also affect the provision of care support and intergenerational transfers over the life course.
The function `kin` also includes information on the number of relatives lost by Focal during her life, stored in the column `count_cum_death`:
```{r, fig.height=6, fig.width=8, message=FALSE, warning=FALSE}
swe_time_varying$kin_summary %>%
DemoKin::rename_kin() %>%
ggplot() +
geom_line(aes(age_focal, count_cum_dead)) +
labs(y = "Expected number of deceased relatives") +
theme_bw() +
facet_wrap(~kin,scales="free")
```
Given these population-level measures, we can compute Focal's the mean age at the time of her relative's death. For a Focal aged 50 yo:
```{r}
swe_time_varying$kin_summary %>%
rename_kin() %>%
filter(age_focal == 50) %>%
select(kin,count_cum_dead,mean_age_lost) %>%
mutate_if(is.numeric, round, 2) %>%
kable()
```
## 4. Prevalences
Given the distribution of kin by age, we can compute the expected portion of living kin in some stage given a set of prevalences by age (e.g., a disease, employment, etc.). This is known as the Sullivan Method in the life-table literature. A matrix formulation for same results can be found in Caswell (2019), which can also be extended to a time-variant framework.
```{r, message=FALSE, warning=FALSE, fig.height=6, fig.width=10}
# let´s create some prevalence by age
swe_2015_prevalence <-
tibble(
age_kin = unique(swe_2015$kin_full$age_kin),
prev = .005 * exp(.05 * age_kin)
)
# plot(swe_2015_prevalence)
# join to kin count estimates and plot
swe_2015$kin_full %>%
left_join(swe_2015_prevalence) %>%
group_by(kin, age_focal) %>%
rename_kin() %>%
summarise(
prevalent = sum(living * prev),
no_prevalent = sum(living * (1-prev))
) %>%
pivot_longer(cols = prevalent:no_prevalent, names_to = "prevalence_state", values_to = "count") %>%
ggplot(aes(x=age_focal, y = count)) +
geom_area(aes(fill=prevalence_state)) +
facet_wrap(~kin) +
theme_bw()
```
## 5. Multi-state models
`DemoKin` allows the computation of kin structures in a multi-state framework, classifying individuals jointly by age and some other feature (e.g., stages of a disease). For this, we need mortality and fertility data for each possible stage and probabilities of changing state by age.
Let's consider the example of Slovakia given by Caswell (2021), where stages are parity states.
`DemoKin` includes the data to replicate this analysis for the year 1980:
- The data.frame `svk_fxs` is the fertility rate by age (rows) for each parity stage (columns). The first stage represents $parity=0$; the second stage, $parity=1$; and so on, until finally the sixth stage represents $parity\geq5$.
- The data.frame `svk_Hxs` has a similar structure but with $1$'s in the ages corresponding to newborns (the first age in our example).
- The data.frame `svk_pxs` has the same structure and represents survival probabilities.
- The list `svk_Uxs` has the same number of elements and ages (in this case 110, where $omega$ is 109). For each age, it contains a column-stochastic transition matrix with dimension for the state space. The entries are transition probabilities conditional on survival.
Following Caswell (2020), we can obtain the joint age-parity kin structure:
```{r}
# use birth_female=1 because fertility is for female only
demokin_svk1980_caswell2020 <-
kin_multi_stage(
U = svk_Uxs,
f = svk_fxs,
D = svk_pxs,
H = svk_Hxs,
birth_female=1
)
```
As an example, consider the age-parity distribution of aunts, when Focal is 20 and 60 yo (this is equivalent to Figure 4 in Caswell [2021]).
```{r, message=FALSE, warning=FALSE, fig.height=6, fig.width=10}
demokin_svk1980_caswell2020 %>%
filter(kin %in% c("oa","ya"), age_focal %in% c(20,60)) %>%
mutate(parity = as.integer(stage_kin)-1,
parity = case_when(parity == 5 ~ "5+", T ~ as.character(parity)),
parity = fct_rev(parity)) %>%
group_by(age_focal, age_kin, parity) %>%
summarise(count= sum(living)) %>%
ggplot() +
geom_bar(aes(x=age_kin, y = count, fill=parity), stat = "identity") +
geom_vline(aes(xintercept = age_focal), col=2) +
labs(y = "Number of aunts") +
theme_bw() +
facet_wrap(~age_focal, nrow = 2)
```
We can also see the portion of living daughters and mothers at different parity stages over Focal's lie-course (this is equivalent to Figure 9 and 10 in Caswell [2021]).
```{r, message=FALSE, warning=FALSE, fig.height=6, fig.width=10}
demokin_svk1980_caswell2020 %>%
filter(kin %in% c("d","m")) %>%
mutate(parity = as.integer(stage_kin)-1,
parity = case_when(parity == 5 ~ "5+", T ~ as.character(parity)),
parity = fct_rev(parity)) %>%
group_by(age_focal, kin, parity) %>%
summarise(count= sum(living)) %>%
DemoKin::rename_kin() %>%
ggplot() +
geom_bar(aes(x=age_focal, y = count, fill=parity), stat = "identity") +
labs(y = "Kin count") +
theme_bw() +
facet_wrap(~kin, nrow = 2)
```
## References
Caswell, H. (2019). The formal demography of kinhip: A matrix formulation. Demographic Research 41:679–712. doi:10.4054/DemRes.2019.41.24.
Caswell, H. (2020). The formal demography of kinship II: Multistate models, parity, and sibship. Demographic Research, 42, 1097–1146.
Caswell, H., & Song, X. (2021). The formal demography of kinhip III: kinhip dynamics with time-varying demographic rates. Demographic Research, 45, 517–546.
Preston, S., Heuveline, P., & Guillot, M. (2000). Demography: Measuring and Modeling Population Processes. Wiley.