Part 9 NSUM Methods Summary

  • The NSUM approach is based on the idea that for all individuals, the probability that someone they know is in a given subpopulation is the size of that subpopulation divided by the overall population size. The estimate of the sub population size is then based on the share of the subpopulation in the respondent’s network. For example, if a respondent knows \(100\) people total and knows \(2\) intravenous drug users, then it is inferred that \(2\%\) of the total population are intravenous drug users. The number of people in a given subpopulation that the respondent knows is assumed to follow a binomial distribution.
  • However, the total number of people known by a respondent, also called his or her degree or personal network size, also needs to be estimated. [why can’t we ask about this directly?] A person’s degree is estimated by asking the respondents about the number of contacts he or she has in several subpopulations of known size, such as twins, people named Nicole, or women over \(70\), using the same assumption that an individual should know roughly their degree times the proportion of people in a given subpopulation. [why asking about hidden population directly will work?]
  • …Does not take account of the different propensities of people to know people in different groups, such as people’s tendency to know people like themselves; these are called barrier effects.
  • Transmission bias arises when a respondent does not count his or her contact as being in the group of interest, for example because the respondent does not know that the contact belongs to the group. This bias may be particularly large when a group is stigmatized, as is the case of most of the key affected populations in which we are interested.
  • Recall bias refers to the tendency for people to underestimate the number of people they know in larger groups because they forget some of these contacts, and to overestimate the number of people they know in small or unusual groups.

Killworth et al. (1998)

Basic setup:

  • \(y_{ik}\) be the number of people known by individual \(i\), \(i = 1, \dots, n\) , in group \(k\), \(k = 1, \dots, K\)
  • Groups \(1, \dots, K − 1\) are of known size and group \(K\) – of unknown size (in general number of unknown size groups can be more than one)
  • \(N_{k}\) – size of group \(k\), and let \(N\) be the total population, which is assumed to be known
  • \(d_{i}\) – number of people that respondent \(i\) knows, also called his or her degree or personal network size
  • Core assumption: \(y_{ik} \sim \mathrm{B} (d_{i}, \frac{N_{k}}{N})\)
  • Main finding: \(\widehat{d_{i}} = N \frac{\sum_{k = 1}^{K} y_{ik}}{\sum_{k = 1}^{K} N_{k}}\) which then leads to \(\widehat{N_{K}} = N \frac{\sum_{i = 1}^{n} y_{iK}}{\sum_{i = 1}^{n} \widehat{d_{i}}}\)

Maltiel et al. (2015)

Additional assumptions (on top of basic setup above):

  • Number of people respondents \(i\) knows is distributed log normal: \(d_{i} \sim \textrm{Lognormal} (\mu, \sigma^{2})\) (Salganik et al. 2011)

  • Priors are \(\pi (N_{k}) \propto \frac{\mathbb{1}_{N_{k} < N}}{N_{K}}\); \(\mu \sim \mathrm{U} (3,8)\); \(\sigma \sim \mathrm{U} (\frac{1}{4}, 2)\)

  • Existence of barrier effect implies that \(\frac{N_{k}}{N}\) can be over or underestimated by respondent depending on the group \(k\), so instead of using this term in binomial distribution assume:

    • \(y_{ik} \sim \mathrm{B} (d_{i}, q_{ik})\) and \(q_{ik} \sim \textrm{Beta} (m_{k}, \rho_{k})\) (mean-sd parametrization), where \(m_{k} = \mathbb{E} [q_{ik}] \equiv \frac{N_{k}}{N}\)

    • priors are \(\pi (m_{k}) \propto \frac{1}{m_{k}}\) and \(\rho_{k} \sim \mathrm{U} (0,1)\)

  • Existence of transmission bias implies that for the group of interest \(K\) respondents might underestimate number of connections due to stigma. Thus they assume:

    • \(y_{ik} \sim \mathrm{B} (d_{i}, \tau_{k}\frac{N_{k}}{N})\), \(\tau_{K} \sim \textrm{Beta} (\eta_{K}, \nu_{K})\) and \(\forall k\neq K:\: \tau_{k} \equiv 1\)

    • standard priors on \(\eta_{K}\) and $\nu_{k}$

  • Recall bias usually implies that respondents under-report large groups and over-report small groups. Use adjustment method instead of implicit parametrisation to adjust for this bias.

  • If we combine two we get:

    • \(y_{ik} \sim \mathrm{B} (d_{i}, \tau_{k} q_{ik})\),
    • \(d_{i} \sim \textrm{Lognormal} (\mu, \sigma^{2})\),
    • \(q_{ik} \sim \textrm{Beta} (m_{k}, \rho_{k})\),
    • \(\tau_{K} \sim \textrm{Beta} (\eta_{K}, \nu_{K})\),
    • \(\mu \sim \mathrm{U} (3,8);\; \sigma \sim \mathrm{U} (\frac{1}{4}, 2);\; \rho_{k} \sim \mathrm{U} (0,1)\)
    • \(\pi (N_{k}) \propto \frac{\mathbb{1}_{N_{k} < N}}{N_{K}} ;\; \pi (m_{k}) \propto \frac{1}{m_{k}}\)

Feehan and Salganik (2016)

  • Focus on in-reports and out-reports:

    • two people are connected by a directed edge \(i \rightarrow j\) if person \(i\) would count person \(j\) as a member of group of interest (e.g. drug injector). Whenever \(i \rightarrow j\), we say that \(i\) makes an out-report about \(j\) and that \(j\) receives an in-report from \(i\).

    • \(K\) - target group, \(U\) - whole population, \(F\) - frame population (actually surveyed)

    • Requires relative probability sampling from hidden population with structure of questions related to the group of known size. E.g. “How many widowers do you know?” combined with “How many of these widowers are aware that you inject drugs?”

  • Total number of out-reports from \(i\) to group \(K\) are \(y_{iK}\) and total number of in-reports about \(i\) from the whole population is \(\nu_{iU}\) and from the frame population is \(\nu_{iF}\)

  • Total out-reports = total in-reports: \(\sum_{i \in F} y_{iK} = \sum_{i \in U} \nu_{iF} \Leftrightarrow N_{K} = \frac{\sum_{i \in F} y_{iK}}{\sum_{i \in U} \nu_{iF} \big/ N_{K}}\)

  • Assume that the out-reports from people in the frame population only include people in the hidden population, then it must be the case that the visibility of everyone not in the hidden population is 0: \(\forall i \notin K:\: \nu_{iF} \equiv 0\)

    • then \(N_{K} = \frac{\sum_{i \in F} y_{iK}}{\sum_{i \in K} \nu_{iF} \big/ N_{K}} = \frac{y_{FK}}{ \overline{\nu_{KF}}}\)
  • Estimate \(y_{FK}\) using HT estimator with known sampling probability from \(F\)

  • Estimate \(\overline{\nu_{KF}}\) using the probe alters, \(\mathcal{A}\) , groups of known size for which we collect relational data from hidden population sample. Using this data the estimator is \(\widehat{\overline{\nu_{KF}}} = \frac{N}{N_{\mathcal{A}}} \frac{\sum_{i \in s_{K}} \sum_{j} \tilde{\nu}_{i, A_{j}} \big/ (c \pi_{i} )}{\sum_{i \in s_{K}} 1 \big/ (c \pi_{i})}\), where \(\tilde{\nu}_{i, A_{j}}\) is the self-reported visibility of member of hidden population to group of known size \(A_{j}\).

  • Use bootstrap methods to estimate the standard errors/CI’s

  • Very useful: Provide comparison with the basic NSUM method (Killworth et al. 1998) and show that \(N_{K} = \underbrace{\left( \frac{y_{FK}}{\overline{d}_{UF}} \right)}_{\text{standard NSUM estimand}} \times \frac{1}{\phi_{F} \delta_{F} \tau_{F}}\), where \(\phi_{F}\) – frame ratio (average connection within \(F\) to connection from \(U\) to \(F\)), \(\delta_{F}\) – degree ratio (average connection from \(K\) to \(F\) to connection within \(F\)), and \(\tau_{F}\) – true positive ratio (in-reports from \(F\) to \(K\) to edges connecting \(F\) and \(K\))

  • Appendix G to the paper also has very useful ideas on network structure simulation using block approach, where the chances of link between two individuals depend on their membership in \(F\) and \(K\) (there is no code, but it sounds like something straightforward to do) ry

Service Multiplier

Feehan, Dennis M., and Matthew J. Salganik. 2016. “Generalizing the Network Scale-up Method: A New Estimator for the Size of Hidden Populations.” Sociological Methodology 46 (1): 153–86. https://doi.org/10.1177/0081175016665425.

Killworth, Peter D., Eugene C. Johnsen, Christopher McCarty, Gene Ann Shelley, and H. Russell Bernard. 1998. “A Social Network Approach to Estimating Seroprevalence in the United States.” Social Networks 20 (1): 23–50. https://doi.org/10.1016/S0378-8733(96)00305-X.

Maltiel, Rachael, Adrian E. Raftery, Tyler H. McCormick, and Aaron J. Baraff. 2015. “Estimating Population Size Using the Network Scale-up Method.” The Annals of Applied Statistics 9 (3): 1247–77. https://www.jstor.org/stable/43826420.

Salganik, Matthew J., Dimitri Fazito, Neilane Bertoni, Alexandre H. Abdo, Maeve B. Mello, and Francisco I. Bastos. 2011. “Assessing Network Scale-up Estimates for Groups Most at Risk of Hiv/Aids: Evidence from a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil.” American Journal of Epidemiology 174 (10): 11901196.