A new academic paper about Voat. Let's take a look : "“I can’t keep it up anymore.” The Voat.co dataset"
Posted: Wed Feb 02, 2022 4:28 am
https://arxiv.org/pdf/2201.05933.pdf
Amin Mekacher,1 Antonis Papasavva 2
1 City, University of London, 2 University College London
Abstract
Voat was a news aggregator website that shut down on De-
cember 25, 2020. The site had a troubled history and was
known for hosting various banned subreddits. This paper
presents a dataset with over 2.3M submissions and 16.2M
comments posted from 113K users in 7.1K subverses (the
equivalent of subreddit for Voat). Our dataset covers the
whole lifetime of Voat, from its developing period starting
on November 8, 2013, the day it was founded, April 2014, up
until the day it shut down (December 25, 2020).
This work presents the largest and most complete publicly
available Voat dataset, to the best of our knowledge. We also
present a preliminary analysis to cover posting activity and
daily user and subverse registration on the platform so that
researchers interested in our dataset can know what to ex-
pect. Our data may prove helpful to false news dissemina-
tion studies as we analyze the links users share on the plat-
form, finding that many communities rely on alternative news
press, like Breitbart and GatewayPundit, for their daily dis-
cussions. Last, we perform network analysis on user interac-
tions finding that many users prefer not to interact with sub-
verses outside their narrative interests, which could be helpful
to researchers focusing on polarization and echo chambers.
Also, since Voat was one of the platforms many Red-
dit users migrated to after a ban, we are confident that our
dataset will motivate and assist researchers studying deplat-
forming. In addition, many hateful and conspiratorial com-
munities seem to be very popular on Voat, which makes our
work valuable for researchers focusing on toxicity, conspir-
acy theories, cross-platform studies of social networks, and
natural language processing.
8 Conclusion
In this work, we present and release a Voat dataset comprising
more than 2.38M submissions and 16.2M comments posted
from 113K users in over 7K Voat subverses. We combine
data collected from Voat API and IAWM released archives to
complete the dataset to the best of our ability. Voat shut down
on December 25, 2020, and its data are now otherwise inac-
cessible. In this work we also perform a preliminary analysis
of the released dataset so researchers interested in it can know
what to expect.
Overall, we hope this work further motivates and assists
researchers focusing on deplatforming and how users orga-
nize massive immigration to other platfroms. In addition, our
dataset could also help answer numerous questions about how
‘free-speech’ sites operated, e.g., do moderators ban users that
express opinions other than the ones aligned with the narra-
tives of a subverse? How do other users vote and how toxic
are they towards such content? Do sites like these incentivize
users to form echo chambers? What kind of content users in
this communities consume, etc.? Also, our dataset could as-
sist multi-platform studies to understand similarities and dif-
ferences of different communities. Last, since Voat was a bas-
tion of free-speech, we are confident that access to our dataset
could assist researchers towards training algorithms in natu-
ral language processing and detecting hate speech, fake news
dissemination, conspiracy theories, etc. Finally, other than
quantitative work, we hope that the data can also be used in
qualitative work studying specific events, social theories, and
communities.
Acknowledgments. This work was partially funded by the
UK EPSRC grant EP/S022503/1 that supports the UCL Cen-
tre for Doctoral Training in Cybersecurity.
Amin Mekacher,1 Antonis Papasavva 2
1 City, University of London, 2 University College London
Abstract
Voat was a news aggregator website that shut down on De-
cember 25, 2020. The site had a troubled history and was
known for hosting various banned subreddits. This paper
presents a dataset with over 2.3M submissions and 16.2M
comments posted from 113K users in 7.1K subverses (the
equivalent of subreddit for Voat). Our dataset covers the
whole lifetime of Voat, from its developing period starting
on November 8, 2013, the day it was founded, April 2014, up
until the day it shut down (December 25, 2020).
This work presents the largest and most complete publicly
available Voat dataset, to the best of our knowledge. We also
present a preliminary analysis to cover posting activity and
daily user and subverse registration on the platform so that
researchers interested in our dataset can know what to ex-
pect. Our data may prove helpful to false news dissemina-
tion studies as we analyze the links users share on the plat-
form, finding that many communities rely on alternative news
press, like Breitbart and GatewayPundit, for their daily dis-
cussions. Last, we perform network analysis on user interac-
tions finding that many users prefer not to interact with sub-
verses outside their narrative interests, which could be helpful
to researchers focusing on polarization and echo chambers.
Also, since Voat was one of the platforms many Red-
dit users migrated to after a ban, we are confident that our
dataset will motivate and assist researchers studying deplat-
forming. In addition, many hateful and conspiratorial com-
munities seem to be very popular on Voat, which makes our
work valuable for researchers focusing on toxicity, conspir-
acy theories, cross-platform studies of social networks, and
natural language processing.
8 Conclusion
In this work, we present and release a Voat dataset comprising
more than 2.38M submissions and 16.2M comments posted
from 113K users in over 7K Voat subverses. We combine
data collected from Voat API and IAWM released archives to
complete the dataset to the best of our ability. Voat shut down
on December 25, 2020, and its data are now otherwise inac-
cessible. In this work we also perform a preliminary analysis
of the released dataset so researchers interested in it can know
what to expect.
Overall, we hope this work further motivates and assists
researchers focusing on deplatforming and how users orga-
nize massive immigration to other platfroms. In addition, our
dataset could also help answer numerous questions about how
‘free-speech’ sites operated, e.g., do moderators ban users that
express opinions other than the ones aligned with the narra-
tives of a subverse? How do other users vote and how toxic
are they towards such content? Do sites like these incentivize
users to form echo chambers? What kind of content users in
this communities consume, etc.? Also, our dataset could as-
sist multi-platform studies to understand similarities and dif-
ferences of different communities. Last, since Voat was a bas-
tion of free-speech, we are confident that access to our dataset
could assist researchers towards training algorithms in natu-
ral language processing and detecting hate speech, fake news
dissemination, conspiracy theories, etc. Finally, other than
quantitative work, we hope that the data can also be used in
qualitative work studying specific events, social theories, and
communities.
Acknowledgments. This work was partially funded by the
UK EPSRC grant EP/S022503/1 that supports the UCL Cen-
tre for Doctoral Training in Cybersecurity.