Keywords

1 Introduction

The Internet plays an increasingly important role in the daily lives of many older adults [1]. For instance, in the European Union, the share of adults aged 55+ using the Internet every day has risen from 23% in 2010 to 37% in 2017 [2]. This increase is complemented by major changes of the activities performed online [3]. The main activities in the past were surfing the web and e-mailing, but the range of activities has broadened by including social networking, shopping, and entertainment [4]. Internet use offers potential to mitigate some of the consequences of aging. Results of longitudinal studies suggest that being online can help reduce the feeling of loneliness [5], prevent isolation through facilitating social ties across distances [6], and help in maintaining cognitive functioning [7] and psychological well-being [8]. Further, online services can assist older adults in leisure activities [9] and health-related information needs [10]. Therefore, digital divide research has shifted from identifying barriers towards physical Internet access (first-level digital divide) to understanding why some older adults perform various online activities and why others do not (second-level digital divide).

A frequently used approach in digital divide research collects quantitative data from a sample of older adults and then examines the role of individual characteristics in explaining whether one belongs to the group of users or nonusers, respectively. This examination can be accomplished by logistic regression analysis, which estimates the parameters of a logistic function for a binary (dichotomous) dependent variable. Accordingly, users will be coded as 1 and nonusers will be coded as 0. The model’s explanatory quality can then be defined as the percentage of observations mapped onto the correct group. This approach has been adopted for explaining older adults’ general Internet use [11,12,13] as well as specific online activities such as e-mailing, general information search, health-related information search, banking, and shopping [14, 15].

Comparing the results of previous studies is difficult because of different sets of explanatory variables being tested as well as heterogeneity in the analysis and reporting of results. Many studies do not discuss the explanatory quality of their regression models. Moreover, many studies solely base their findings on thresholds for levels of statistical significance rather than reporting exact p-values complemented by confidence intervals for the strength of associations. These deficits undermine the ability to compare, appraise, and integrate the results of previous studies, which is essential for building cumulative knowledge from single quantitative studies. Against this backdrop, our research aims at enhancing the understanding of online activities among older adults by demonstrating how contextualized information can aid the interpretation of results from logistic regression in this domain.

Specifically, we address the following research question: What is the role of socio-demographic characteristics in explaining whether older adults perform various online activities? To answer this question, we first examine the current state of reporting logistic regression in the older adults’ second-level digital divide literature. This examination relies upon reporting guidelines for logistic regression analysis, which have been articulated for research in the tradition of null-hypothesis significance testing (NHST) in fields such as biology [16] and epidemiology [17]. To show how these guidelines can be applied, we present an empirical study that assesses the role of gender, age, education, subjective health, and living arrangement in explaining eight online activities, grouped into informational, social, and instrumental activities. This study analyzes primary survey data, which we collected in the summer of 2017 from older adults (65 +) in Germany (N = 1,079).

2 Current State of Reporting Logistic Regression Analysis

Understanding Internet use in older adults is a multi-disciplinary field of research. We focus on empirical studies that used logistic regression analysis to examine how individual characteristics impact the probability of belonging to the group of users vís-a-vís nonusers. We first define a set of reporting criteria, which we derived from recommendations for reporting results of NHST and adapted to the domain.

Correlations Among Independent Variables (IV):

Because high correlations between two or more IVs can lead to unstable estimates of coefficients, correlations should be inspected prior to building a regression model. This phenomenon is referred to as multicollinearity. High correlations might suggest that one IV is redundant as it can be explained by the other IV. Multicollinearity can be detected by calculating the Variance Inflation Factor (VIF) for each IV, for which cut-offs have been proposed [18]. There can be reasons to maintain IVs with high VIF, for instance, if the variable is a control, while the variable hypothesized to be a factor does not have a high VIF.

Explained Variance:

In linear regression models, r-squared (R2) is a familiar measure for the explanatory power, defined as the share of variance in the DV explained by the model. Therefore, this measure can range between zero and one, with higher values suggesting better explanations. For logistic regression, different counterparts have been proposed such as Nagelkerke’s pseudo R2. Interpreting such measures should take into account that values of pseudo R2 are often low and, thus, readers familiar with linear regression might be skeptical about the model’s explanatory power [19].

Percentage Correct:

Given the difficulties in interpreting pseudo-R2 measures, the explanatory quality can be directly assessed by a comparison of observed values (i.e., participants’ answers) to predicted values (i.e., calculated by the logistic function) [19]. Overall, the percentage correct should be (much) greater than 50%, thus exceeding the performance of a random classifier. Moreover, percentages correct can be calculated for each response category such that we know how well the model predicts membership in the user and nonuser categories.

Exact p-values:

Much of NHST builds on the p-value measure by assigning statistical significance to a test result (e.g., about an association) if the p-value falls below a threshold, namely, p < .05. The role of p-values has been a subject of debate since their emergence, and misconceptions and misinterpretations can still be observed [20, 21]. In short, a p-value does not indicate the strength of an association, does not describe the probability of the null-hypothesis being true, and smaller p-values do not provide stronger support for the association [22]. Moreover, p-values are contingent upon the sample size such that even marginal associations will become “statistically significant” in large samples. Although alternative approaches to the reliance upon p-values have been proposed, interpretation of test results can be facilitated by reporting exact p-values instead of using common but somewhat arbitrary thresholds.

Confidence Intervals (CI):

While logistic regression provides a measure for the strength of the association between IV and DV, the resulting odds-ratio (OR) is only an estimation for the sample under study. Therefore, it is very unlikely that a given OR will be the same for another sample. ORs can be contextualized by reporting their confidence interval. For instance, a 95% confidence interval is defined as the range of values that will cover the true value of OR for 95% of samples that can be drawn. This information can aid in interpreting the results with respect to the domain [19]. A relatively wide CI could suggest that the practical meaning of an association can be very different (ranging from negligible to important). If the CI is rather narrow, we might have more clarity about the degree to which differences in the IV (e.g., gender) will impact the probability of belonging to the user (or nonuser) group.

For our review of previous research, we considered studies that met the following requirements: DV is Internet use; socio-demographic characteristics included in the set of IVs; cross-sectional data collected within the past ten years; logistic regression analysis; results published in a refereed journal. Eventually, we identified eleven articles from the literature. Table 1 presents sample sizes, years of data collection, and the fulfillment of reporting criteria that we discussed above.

Table 1. Sample characteristics and statistics reported in previous studies.

None of the studies reports on correlations among IVs or assesses whether multicollinearity was present. Five studies use pseudo-R2 to measure the explanatory power, with values ranging from .20 [23], .25 [13], .34 [24], and .39/.42 [11] to .64 [25]. Percentage correct is available from only one study: the model presented by Friemel et al. correctly predicted group membership for about 84.2% of the observations (DV: general Internet use) [25]. While five studies report exact p-values, six studies omit this information but categorize the level of statistical significance by thresholds such as p < .05, p < .01, and p < .001. Confidence intervals are available from six studies, while three studies report the standard error (SE) for each OR. In principle, CIs can be calculated from SEs but this calculation must be undertaken by the reader. Moreover, SEs cannot be directly interpreted with respect to the domain, which is unlike CIs.

Overall, previous studies on the second-level digital divide in older adults exhibit few common reporting practices. The fulfillment of reporting recommendations is still low. This finding similarly applies to measures of explanatory quality, exact p-values, and indicators for the strength and reliability of associations. Given that percentage correct is a likely intuitive measure, it is surprising that only one of eleven studies reports percentage correct to highlight the quality of explanation. In summary, our discussion of previous studies suggests that the provision of contextualized information from logistic regression analysis is in a premature state. Therefore, we seek to demonstrate how this information can facilitate the interpretation of regression results when we examine a broad set of online activities in older adults.

3 Method

We approach older adults’ online activities through the lens of van Dijk’s resources and appropriation theory of the diffusion, acceptance and adoption of new technologies [29, 30]. This theory describes a mechanism that relies upon categorical inequalities between groups of people in society. These inequalities lead to unequal distribution of resources, which then causes unequal access to technologies. The main categorical inequalities relate to socio-demographic characteristics such as age, gender, and health. Each inequality can be described by at least one dominating group versus one subordinated group, e.g., younger versus older and men versus women. The quantity and quality of resources available to an individual will be determined by their belonging to certain groups. In the context of online activities, resources include having a device with Internet access (material), cognitive abilities (mental), and support in use (social). Because the theory explains individual differences in the use of digital technologies, it is adequate for studying Internet use in general [25, 31] and differentiated online activities in particular [32]. We focus on a core set of groups in society defined by differences in gender (men vs. women), age (young vs. old), education (high vs. low), health (good vs. poor), and living arrangement (living together vs. alone). The present study (1) considers socio-demographic factors but not psychometric factors and (2) examines Internet use vs. nonuse but not frequency of Internet use. For these two reasons, the present study distinguishes from our previous study on differentiated Internet use in older adults [33].

3.1 Participants

We analyzed data from a survey, which we conducted in the summer of 2017. This survey targeted all older adults (65+) living in three districts of a city with around 260,000 inhabitants in Germany. The city administration and a municipal provider of geriatric care were involved in the conduct of the survey. A questionnaire was sent out by mail to 6,170 individuals registered in one of the three districts. Considering that 100 addresses turned out to be invalid, the 1,302 responses received account for a response rate of 21.5%. This rate is similar to prior surveys that used posted self-administered questionnaires targeted at older adults [34]. For the present study, we defined a subsample including participants, who answered all questions on online activities and socio-demographic variables (N = 1,079; no missing values). With respect to gender and age distributions, our subsample did not differ from the city’s population of older adults [35]. Our subsample exhibited a greater share of participants holding a college or university degree (14.9% vs. 6.5%) and a smaller share of participants reporting no high school education (1.2% vs. 6.1%).

3.2 Measurements

Online Activity:

Our dependent variables were based on questions about the frequency of various online activities. In the present study, we considered three informational, three social, and two instrumental online activities. Informational online activities were defined as follows: “searching for information on the Internet (e.g., using Google)”, “using the Internet to inform myself about events in the city”, and “viewing pictures and videos”. Social online activities were listed as follows: “writing e-mails”, “sending pictures and videos”, and “writing comments and reviews”. Instrumental online activities included “using banking services on the Internet (online banking)”, and “purchasing on the Internet (online shopping)”. Because the survey administered a five-point ordinal frequency scale (daily, weekly, monthly, fewer, never), we recoded the responses into a dichotomous variable, with 0 representing “never”, and 1 otherwise. Therefore, each of our eight dependent variables differentiated users (who perform an online activity) and nonusers (who do not perform that online activity).

Socio-Demographic Variables:

Gender was coded as female (0) or male (1). Age was calculated from the year of birth. Education had three levels: low for primary and lower secondary education, medium for upper secondary education and vocational training, and high for academic education. The values were derived from a question that offered nine answer options specific to the education system of Germany. Subjective health was a self-reported measure about one’s individual health [36]. We administered a five-point rating scale (1 = very bad, 2 = rather bad, 3 = moderate, 4 = rather good, and 5 = very good). Living together was coded as a dichotomous variable, with 1 for those who were living in a household of two or more persons, and 0 otherwise.

3.3 Data Analysis Plan

Our study set out to examine the association between older adults’ socio-demographic characteristics and their performance of various online activities. For this purpose, we first conducted descriptive analyses and then assessed correlations between study variables (for which we used the Spearman’s correlation test because none of the metric variables was normally distributed). We examined the role of socio-demographic variables in explaining online activities by using logistic regression. We tested the assumptions of logistic regression but there were no deviations. For each regression model, we report explained variance and the percentages correct (per category and total). For the independent variables, we present the exact p-value, OR, and 95% CI. The OR states how the probability of the user category changes for one-unit increase in the independent variable (OR > 1 for positive changes, OR < 1 for negative changes). All analyses were conducted using IBM SPSS Statistics 25. The significance level was 5%.

4 Results

4.1 Data Screening

Table 2 presents descriptive statistics. Our sample was balanced in terms of gender. The average age was 75.28 years (SD = 7.07). Every second respondent had education at a medium level, and academic education was reported by every seventh participant. Good health was indicated by one-half (M = 3.48, SD = 0.89, on a 1–5 scale). More than two-thirds lived together. The most prevalent online activities were the three informational activities and writing e-mails (each reported by more than half of the participants), followed by sending pictures/videos (43.7%), shopping (37.4%), banking (23.8%), and writing comments/reviews (22.5%).

Table 2. Socio-demographics and online activities (N = 1,079).

In the next step of our data screening, we assessed correlations among the study variables (Table 3). Correlations between the five socio-demographic variables were very weak to weak (absolute coefficients between 0.07 and 0.29). Subsequently, we found that multicollinearity was not present in our data, as shown by VIFs ranging between 1.10 and 1.16, which is below a conservative cut-off of 2.5 [37]. Table 3 shows that online search was positively correlated with being male and younger, reporting higher education and better health, and living together (p < .001, weak to average strengths of correlations). The results were similar for the other online activities (not tabulated). Finally, correlations between the eight online activities ranged from weak to strong, providing support for our differentiation of online activities and their categorization into informational, social, and instrumental activities.

Table 3. Correlations of socio-demographic variables and information search (N = 1,079).

4.2 Logistic Regression Analyses

Tables 4, 5 and 6 show the results of our logistic regression analyses, grouped into informational, social, and instrumental online activities. To first demonstrate how the results can be interpreted, we refer to information search (Table 4).

Table 4. Logistic regression analyses for informational online activities (N = 1,079).
Table 5. Logistic regression analyses for social online activities (N = 1,079).
Table 6. Logistic regression analyses for instrumental online activities (N = 1,079).

The probability of information search was 129% greater for men than for women (OR = 2.29). Each one-year higher age reduced the probability by 11% (OR = 0.89). Education was associated with use, showing an increase of the probability by 46% for medium levels and 636% for high levels, respectively. We note that the 95% CI for high education was rather wide, which suggests that high education can have a strong but highly varied association when replicating our study. Each one-unit higher rating of subjective health increased the probability by 39%. Participants who lived together with someone had improved odds of information search by 42%. The model correctly identified four out of five users and three out of five nonusers, respectively. These results amount to percentage correct of 71.9%, while the model explained 36.6% of the variance in information search. Considering all three informational online activities, the pattern of results was very similar, except for living together, which was not associated with informing about city events (p = .169).

With respect to social online activities, Table 5 shows that participants who were men and younger, indicated high levels of education, and reported better health were more likely to belong to the user group. Contrary to the results for information online activities, medium levels of education were not associated with social online activities, and the odds ratios for high levels were also smaller. While living together was associated with sending pictures/videos, the 95% CI was rather wide (1.01–1.91), and the p-value was only slightly below the level of significance (p = .046). Therefore, the OR must be interpreted with great caution and related to the large size of our sample. The most interesting result from Table 5 is the low percentage correct of users in the model for writing comments/reviews (17.7%). In other words, while the model was very effective in identifying nonusers (96.2%), its accuracy for users was worse than even chance.

Perusal of Table 6 shows that banking and shopping were associated with being men and younger age as well as reporting high levels of education and better health, respectively. In addition, shopping was associated with living together. However, the model for banking failed in the identification of users (percentage correct of 23.0%), and the model for shopping performed only slightly better than even chance (51.2%).

5 Discussion

5.1 Summary of Findings

This study examined the role of socio-demographic characteristics in explaining whether older adults belong to the group of Internet users or nonusers, respectively. Using logistic regression analysis, we found that those who were men and younger, possessed higher education, and perceived better health had higher odds for various informational, social, and instrumental online activities. Living together with someone enhanced the probability of four out of eight online activities. Across all online activities, we observed nuanced differences in the strengths of associations. We also found that two regression models failed to achieve sufficient explanatory quality as shown by low percentages correct for the user group. Our research highlights the need to include contextualized information when reporting results of logistic regression. In summary, we provide evidence that the adoption of reporting guidelines can help avoid pitfalls in developing logistic regression models and, ultimately, interpreting models of the second-level digital divide in older adults.

5.2 Limitations

Our study has the following limitations. First, our logistic regression analyses used cross-sectional data, which does not allow us making causal inferences of the phenomena. Therefore, longitudinal studies including intervention studies are required to assess cohort effects and provide stronger support for the tested associations. Second, while we considered eight online activities covering a broad range of activities, our study did not examine participation in social networking sites. Third, our convenient sample was drawn from a specific city in Germany; hence, the results may not necessarily be generalized to older adults living in rural areas or other regions.

5.3 Implications

Our study results have several implications for future research. First, researchers can apply the reporting criteria that we adapted to the domain of digital divide in older adults. We suggest to always report percentages correct (for each group), effect sizes, exact p-values, and confidence intervals, and test for multicollinearity, because this information is essential for assessing the validity, reliability, and usefulness of logistic regression analysis. Much less emphasis should be put on thresholds for p-values but test results should always be related to sample sizes, statistical power, and the practical significance of results. For this purpose, effect size measures should be translated back to the digital divide domain. As our review of the extant literature shows, heterogeneity in reporting is still prevalent. The proposed actions by researchers can be implemented immediately to promote uniformity and comparability of research results.

Second, our study results demonstrate that barriers towards the accumulation of knowledge about the digital divide in older adults continue to exist. While the literature review by Hunsaker and Hargittai identified barriers due to diversity in the measurement of explanatory variables and Internet use [3], we focus on the statistical analysis and reporting. Providing contextualized information when conducting logistic regression analysis can facilitate the accumulation of knowledge. The pace of the required changes can be amplified within the academic ecosystem by journals that develop reporting standards, and editors and reviewers that reward the adoption of such standards. This new practice will enable the comparison and integration of results from single studies, and in the long run, allow for meta-analysis. Overall, we believe that these efforts will help in building and validating digital divide theory, which can then provide rationale for the development of interventions and policies targeted at the digital inclusion of older adults.

Third, our testing of propositions derived from van Dijk’s resources and appropriation theory provides the foundation for examining further online activities. For instance, social networks and messaging services attain increasing relevance for the group of older adults [5, 38], and thus the role of socio-demographics in explaining their use and nonuse requires investigation. Moreover, additional categorical inequalities in society should be examined for a broader range of online activities including the use of specific online services.

Finally, our findings are also relevant for practice as they help identify subgroups who do not perform online activities. For instance, developers and providers can tailor their services to the needs of specific groups of older adults, e.g., by designing responsive and barrier-free interfaces that adapt to individual capabilities in cognition, vision, and motor function [39, 40]. In addition, understanding the predictors of online activities can assist decision-makers in devising legislation and interventions aimed at older adults. To ensure digital inclusion of older adults, they must be considered as an important customer group for online activities and services. Thus, factors that mitigate or promote social exclusion from the digital world should be taken into account [41].