Ruqqus public dataset
Posted: Fri Oct 15, 2021 9:57 pm
Public notes
First line '[title](url)' of the md plaintext(item_hpt) of submissions is the title of submission. The rest is the body of submission.
The attribute 'item_id_alias' is specific to each type (submission, mod list, mod log, etc) of content and is not unique among types.
The modlog by ruqqus api has 31396 entries. The modlog on record has 71174 entries. The official modlog is missing 39778 entries (56%).
This version of release is mainly composed of the plaintext in markdown format. It includes the dataset from pre-purge period. Whatever the admins had purged should mostly be there (I hope). Ruqqus cut the cord way too early than promised, making its site unusable. Furthermore, it appeared to have purged more than 50% of the record on modlog, making the latest dataset questionable. As a result, additional info will not be included. It is moo at this point.
Record count
submissions: 550271
mods: 8367
modlogs: 71174 (31396 on official record)
users: 12363 (excluding commenters)
submissions: item_id_alias, item_url, author, pub_date, guild, upvote, downvote, score, comment_count, item_hpt_ver, item_hpt
mods: item_id_alias, guild, author, pub_date, mod_type, permissions, item_hpt_ver, item_hpt
modlogs: item_id_alias, item_url, author, pub_date_est, guild, item_hpt_ver, item_hpt
Files ... 10-14.json ... 10-14.json ... 10-14.json
@SearchVoat, I hope this could be made searchable on SVF. Thank you!!
Edit: it is possible that modlogs became hidden, when guilds were set to private or ban. It would not be efficient to delete modlogs individually.
First line '[title](url)' of the md plaintext(item_hpt) of submissions is the title of submission. The rest is the body of submission.
The attribute 'item_id_alias' is specific to each type (submission, mod list, mod log, etc) of content and is not unique among types.
The modlog by ruqqus api has 31396 entries. The modlog on record has 71174 entries. The official modlog is missing 39778 entries (56%).
This version of release is mainly composed of the plaintext in markdown format. It includes the dataset from pre-purge period. Whatever the admins had purged should mostly be there (I hope). Ruqqus cut the cord way too early than promised, making its site unusable. Furthermore, it appeared to have purged more than 50% of the record on modlog, making the latest dataset questionable. As a result, additional info will not be included. It is moo at this point.
Record count
submissions: 550271
mods: 8367
modlogs: 71174 (31396 on official record)
users: 12363 (excluding commenters)
submissions: item_id_alias, item_url, author, pub_date, guild, upvote, downvote, score, comment_count, item_hpt_ver, item_hpt
mods: item_id_alias, guild, author, pub_date, mod_type, permissions, item_hpt_ver, item_hpt
modlogs: item_id_alias, item_url, author, pub_date_est, guild, item_hpt_ver, item_hpt
Files ... 10-14.json ... 10-14.json ... 10-14.json
@SearchVoat, I hope this could be made searchable on SVF. Thank you!!
Edit: it is possible that modlogs became hidden, when guilds were set to private or ban. It would not be efficient to delete modlogs individually.