Research:Surviving new editor
- = 1 edit
- = 1 edit
- = 1 day
- = 30 days (~ one month)
- = 30 days (~ one month)
SET @activation_period = 1; /* One day */
SET @n = 1; /* One activation edit */
SET @trial_period = 30; /* 30 days */
SET @survival_period = 30; /* 30 days*/
SET @m = 1; /* One survival edit */
SET @start_date = "20140101"; /* January 1st, 2014 after midnight */
SET @end_date = "20140201"; /* February 1st, 2014 before midnight */
SELECT
user_id,
user_name,
user_registration,
SUM(activation_edits) > @n AS activated,
SUM(activation_edits) > @n AND SUM(surviving_edits) > @m AS surviving,
(
UNIX_TIMESTAMP(NOW()) <
UNIX_TIMESTAMP(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY))
) AS censored
FROM (
SELECT
user_id,
user_name,
user_registration,
SUM(
rev_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M")
) AS activation_edits,
SUM(
rev_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
) AS surviving_edits
FROM user
LEFT JOIN revision ON
user_id = rev_user AND
(
rev_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR
rev_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
)
WHERE user_registration BETWEEN @start_date AND @end_date
UNION ALL
SELECT
user_id,
user_name,
user_registration,
SUM(
ar_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M")
) AS activation_edits,
SUM(
ar_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
) AS surviving_edits
FROM user
LEFT JOIN archive ON
user_id = ar_user AND
(
ar_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR
ar_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period+@survival_period DAY), "%Y%m%d%H%i%M")
)
WHERE user_registration BETWEEN @start_date AND @end_date
) split_edit_counts
GROUP BY user_id, user_name, user_registration;
Surviving new editor is a standardized user class used to measure the number of first-time editors in a wiki project who continue to edit for a substantial period of time. It's used as a proxy for editor retention.
Discussion
[edit]The activation period
[edit]The activation period selects users whose retention needs to be measured:
- setting measures the retention (or rather a delayed activation) of newly registered users, regardless of when they started editing.
- by setting to a value other than 0 we restrict the measurement of retention to a subset of users who edited within a given activation period since registration
- by setting we measure the retention of new editors, based on the proposed definition of a new editor: when we do so, we effectively consider surviving new editors as a proper subset of new editors.
The trial period
[edit]During the trial period, new editors are presumed to be testing out Wikipedia and Wikipedians are testing out the editor. This is the time when non-retained editors tend to leave Wikipedia and when retained editors decide to stick around. The longer the duration of this period, the longer an editor will need to remain active in order to be counted.
The survival period
[edit]During the survival period, new editors who are retained are expected to show some activity to indicate their survival. The longer the duration of the survival period, the more likely we are to notice some activity from editors who are less consistently active. Longer survival periods are also likely to catch users who left Wikipedia reactivating their accounts.
Analysis
[edit]Wikis
[edit]German
[edit]English
[edit]Sensitivity
[edit]Trial period duration
[edit]Figure #Trial period factor plots the factor relationship between the # of users who edit after 3 months (horizontal line at ) and the number users who edit after 1, 2, 4, 5 and 6 months. It looks like both enwiki and dewiki have a bit of trend where the number of users surviving for 1 or 2 trial months in relation to 3 or more is changing. This is not extreme and therefore might not matter. But it does suggest that even users who survive 1-2 months are getting less likely to survive 3.
Survival period duration
[edit]Figure #Survival period factor plots the factor relationship between the # of users who edit within a 3 month window (horizontal line at ) and the number users who edit within 1, 2, 4, 5 and 6 month windows. For the survival period duration, we don't see any meaningful change over time.
Usage
[edit]- New editors' first session and retention -- Used to compare the survival of new editors over time and as the dependent variable in a logistic regression.
- Research:Teahouse long term new editor retention