Semi$^+$-supervised learning under sample selection bias

Robatian, Damoon; Asgharian, Masoud

In time-to-event data analysis, the main object of interest is the time elapsed between the occurrence of two ordered events, say $E_1, E_2$ . Sampling from the incident population, i.e., subjects who have experienced the incidence of $E_1$ before being sampled regardless of the occurrence of $E_2$ , is the gold standard in follow-up studies. Yet often in practice, it is more feasible to sample from the prevalent population, i.e., subjects who have already experienced $E_1$ , but not $E_2$ . It is well known that the prevalent sampling design induces sample selection bias. Moreover, time-to-event data are usually subject to censoring which causes partial loss of information on a fraction of the subjects. Here, we discuss the inefficiency of the conventional learning methods due to ignoring sample selection bias and show how this problem can be avoided by properly incorporating the selection bias into the analysis. Arguments are backed by simulation studies.

Published April 2020 , 7 pages

Research Axes

Research application

Health

Document

G2023-EIW09.pdf (300 KB)

GERAD

G-2020-23-EIW09

Semi $^+$ -supervised learning under sample selection bias

Damoon Robatian and Masoud Asgharian

Research Axes

Research application

Document