As a part of the Desktop Improvements project, the Web team developed a new feature: sticky header, to make the tools ( talk pages, history pages, language switching, etc. ) easier for readers and editors to access.
In the old feature, the commonly used tools are only available at the top of the page.
Figure 1: Old Feature
The new functionality will allow access to these tools throughout the page via a “sticky” header. It will reduce the time readers and editors spend scrolling up and down the page for navigational purposes.
Figure 2: New Feature: Header on the Top of the Page
Figure 3: New Feature: Sticky Header Shown When User Scroll Back
The WEB team ran the AB test from Jan 5 2022, 8:34 PM to Mon, Jan 31,
8:24 PM to assess the impact of deploying the sticky header. The test
ran on desktop logged-in users on 22 pilot wikis. Logged-in users who
were assigned to the treatment group with the new skin version
(skinversion=2
) saw the sticky header while users in the
control group saw the old feature.
The primary goal of the AB test was to test the hypothesis that the sticky header decreases the need to stroll to the top of the page. Meanwhile, it was to answer the following two questions:
What is the clickthrough rate (per pageview or per session) of each item on the sticky header?
What is the ratio of clicks of sticky header items to the corresponding items at the top of the page?
The AB test was run on a per wiki basis on logged-in users. Users included in the test were randomly assigned to either the control group (sticky header disabled) or treatment group (sticky header enabled) based on their user ID.
Data of test enrollment was collected in the
mediawiki_web_ab_test_enrollment
event logging table. Data
of scroll back actions was collected in
mediawiki_web_ui_scroll
event logging table. Data of click
actions was collected in the DesktopWebUIActionsTracking
event logging table.
Even though the experiment was assigned based on user ID, we did not collect user ID or other user data. Therefore, the metrics we defined are session based. We compared the number of times scrolling back to the top out of the number of unique sessions between control and treatment groups.
We also looked into the metric by each type of the links. The types
of links include: personal dropdown
,
watchlist
, mytalk
, sandbox
,
preferences
, beta features
,
my contributions
, language dropdown
,
history
, user page
, talk tab
,
logout
.
In addition, we reviewed the number of clicks on sticky header to that of the clicks on the top of the page.
Hypothesis to test: the sticky header decrease the need to stroll to the top of the page
Data exploration and findings:
The sessions in AB testing are not distributed evenly between control and treatment groups on all wikis. Same for pageviews. To eliminate the impact of uneven distribution, in the following AB testing analysis, we will use clicks/pageviews or sessions as the measurement metric, instead of the number of clicks.
Only 21.22% sessions which had scrolls-back actions have click intention. Our analysis only focused on users who scroll back to click the header links. We believe they are the user group this feature will benefit.
587 sessions in stikyHeaderEnabled group has old skin version
(skinversion=1
). They will not see the new feature. Our
analysis exclude them in treatment group (stikyHeaderEnabled).
809 sessions are assigned to both control and test groups. We excluded them in analysis.
7 wikis have less than 10 total sessions during the experiment timeframe. We will exclude them in analysis. They are Moroccan Arabic Wikipedia (arywiki), Wikimedia Foundation(foundationwiki), French Wikiquote (frwikiquote), Polish Wikinews (plwikinews), Portuguese Wikiversity (ptwikiversity), Venetian Wikipedia (vecwiki), Vietnamese Wikibooks (viwikibooks).
Figure1 shows the trends of number of daily scroll-backs out of number of daily unique sessions from logged-in users who clicked header links on each wiki. 12 wikis were enrolled in AB test for 25 days. 10 wikis were enrolled in the AB test for 11 days.
Figure 4: Trend of scrolls per session
Metric definition: We define the measurement metric as the number of times scroll back to the top out of the number of unique sessions. Initially we considered the number of times scroll back to the top out of the pageviews. However, we discovered 100 sessions have more than 1000 daily pageviews. The sessions are of spider or automated users. However, we can not confirm because we have not collected user level data.
Figure 5: Number of scrolls out of number of unique sessions by each wiki, only for logged-in users who clicked links
Figure 6: Percent change in scrolls per sessions in the AB test for the sticky header
Average percent change:
[1] -14.99867
T-test
Dependent variable | |||||
---|---|---|---|---|---|
Predictors | Estimates | CI | Statistic | p | df |
x | 2.97 | 1.30 – Inf | 3.13 | 0.004 | 14.00 |
##
## Paired t-test
##
## data: x and y
## t = 3.135, df = 14, p-value = 0.003653
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 1.301816 Inf
## sample estimates:
## mean of the differences
## 2.97102
Conclusion: As the p-value < 0.05 and the confidence interval is on the right side of 0. We can reject null hypothesis. The alternative hypothesis holds. It means we have sufficient statistical evidence to conclude that scrolls per session in the treatment group is significantly less than that of the control group.
There was an average 15.0% decrease[^1] in scrolls per sessions by logged-in users on the 15 polit wikis in the treatment group compared with the control group. It is in line with our expectations. 2 wikis, Wikimedia Incubator (43.73%) and Turkish Wikipedia (12.75%), show an overall increase in scrolls per session in the treatment group. We have not recorded any user level information, like the number of unique users, their user group and the distribution of editing counts. We can not further investigate what makes these two wikis different from the others.
[^1]Calculated by taking the average of the percent changes observed on each early adopter wiki.
Figure 7 and Figure 8 shows the clickthrough rate per pageview and the clickthrough rate per session of each item on the sticky header.
Figure 7: Clicks per pageview by each link
Figure 8: Clicks per session by each link
Personal dropdown
is the most clicked link per
session because it’s the menu to click most of the other links. Median
of clicks per pageview :0.0313765 . Median of clicks per session
:0.391411
Language dropdown
is the second most clicked link
per session. Median of clicks per pageview:0.0062078 . Median of clicks
per session :0.081435
Other links have a low click through rate. Median of clicks per pageview :6.867e-04 . Median of clicks per session :0.0102015
Engineer has confirmed a bug of click events tracking, resulting the low click through rate. T304366
Below figures show the number of clicks on the sticky header to the corresponding items at the top of the page by each type of the link.
Below table shows the rate of clicks by each type of link on the sticky header out of the total clicks on both sticky header and the corresponding items at the top of the page.
Link Type | Clicks on Top Header | Clicks on Sticky Header | Sticky Header Click Rate % |
---|---|---|---|
beta features | 817 | 103 | 11.20 |
history | 81670 | 1704 | 2.04 |
language dropdown | 0 | 29999 | 100.00 |
logout | 4765 | 228 | 4.57 |
my contributions | 23786 | 971 | 3.92 |
mytalk | 2152 | 104 | 4.61 |
personal dropdown | 257269 | 20824 | 7.49 |
preferences | 2116 | 105 | 4.73 |
sandbox | 5142 | 227 | 4.23 |
talk tab | 34497 | 2078 | 5.68 |
user page | 49 | 857 | 94.59 |
watchlist | 20383 | 1080 | 5.03 |
Sticky header user page
link has the highest click
ratio to top header. 94% of clicks on user page is from sticky
header.
The other links on sticky header have lower click ratio than top header, ranging from 2.04% to 11.2%.
Language dropdown
does not capture any clicks on the
top header. Need to check whether the instrumentation is
enabled.
To support engineer to verify some event logging issues.
The bug of missing click events on tool links, which caused the low click through rate. T304366
Language dropdown
does not capture any clicks on the
top header. Need to check whether the instrumentation is
enabled.
Some sessions are assigned to both control and treatment groups.
In addition to the t-test method, to try other statistical modeling methods like hierarchical regression to test the hypothesis. Code credit
Megan Neisler’s table and figure formatting code in https://github.com/wikimedia-research/Desktop-Improvements-Search-Move-Analysis-2020/blob/main/search-location-move-AB-test-report.Rmd
Mikhail Popov’s wmfdata: R package https://github.com/wikimedia/wmfdata-r