Page 1 of 4

Ruqqus public dataset

Posted: Fri Oct 15, 2021 9:57 pm
by MadWorld
Public notes

First line '[title](url)' of the md plaintext(item_hpt) of submissions is the title of submission. The rest is the body of submission.

The attribute 'item_id_alias' is specific to each type (submission, mod list, mod log, etc) of content and is not unique among types.

The modlog by ruqqus api has 31396 entries. The modlog on record has 71174 entries. The official modlog is missing 39778 entries (56%).

This version of release is mainly composed of the plaintext in markdown format. It includes the dataset from pre-purge period. Whatever the admins had purged should mostly be there (I hope). Ruqqus cut the cord way too early than promised, making its site unusable. Furthermore, it appeared to have purged more than 50% of the record on modlog, making the latest dataset questionable. As a result, additional info will not be included. It is moo at this point.

Record count
submissions: 550271
mods: 8367
modlogs: 71174 (31396 on official record)
users: 12363 (excluding commenters)

Attributes
submissions: item_id_alias, item_url, author, pub_date, guild, upvote, downvote, score, comment_count, item_hpt_ver, item_hpt
mods: item_id_alias, guild, author, pub_date, mod_type, permissions, item_hpt_ver, item_hpt
modlogs: item_id_alias, item_url, author, pub_date_est, guild, item_hpt_ver, item_hpt

Files
https://archive.org/download/ruqqus-pub ... 10-14.json
https://archive.org/download/ruqqus-pub ... 10-14.json
https://archive.org/download/ruqqus-pub ... 10-14.json

@SearchVoat, I hope this could be made searchable on SVF. Thank you!!

Edit: it is possible that modlogs became hidden, when guilds were set to private or ban. It would not be efficient to delete modlogs individually.

Re: Ruqqus public dataset

Posted: Sat Oct 16, 2021 1:18 am
by antiliberalsociety
MadWorld wrote: Fri Oct 15, 2021 9:57 pm Public notes

First line '[title](url)' of the md plaintext(item_hpt) of submissions is the title of submission. The rest is the body of submission.

The attribute 'item_id_alias' is specific to each type (submission, mod list, mod log, etc) of content and is not unique among types.

The modlog by ruqqus api has 31396 entries. The modlog on record has 71174 entries. The official modlog is missing 39778 entries (56%).

This version of release is mainly composed of the plaintext in markdown format. It includes the dataset from pre-purge period. Whatever the admins had purged should mostly be there (I hope). Ruqqus cut the cord way too early than promised, making its site unusable. Furthermore, it appeared to have purged more than 50% of the record on modlog, making the latest dataset questionable. As a result, additional info will not be included. It is moo at this point.

Record count
submissions: 550271
mods: 8367
modlogs: 71174 (31396 on official record)
users: 12363 (excluding commenters)

Attributes
submissions: item_id_alias, item_url, author, pub_date, guild, upvote, downvote, score, comment_count, item_hpt_ver, item_hpt
mods: item_id_alias, guild, author, pub_date, mod_type, permissions, item_hpt_ver, item_hpt
modlogs: item_id_alias, item_url, author, pub_date_est, guild, item_hpt_ver, item_hpt

Files
https://archive.org/download/ruqqus-pub ... 10-14.json
https://archive.org/download/ruqqus-pub ... 10-14.json
https://archive.org/download/ruqqus-pub ... 10-14.json

@SearchVoat, I hope this could be made searchable on SVF. Thank you!!
Great work! They're trying to shoah their seedy past.

Re: Ruqqus public dataset

Posted: Sat Oct 16, 2021 1:23 am
by MadWorld
antiliberalsociety wrote: Sat Oct 16, 2021 1:18 am Great work! They're trying to shoah their seedy past.
Their site is barely functional at this point. You could try to access its https://api.ruqqus.com. But as soon as 3 to 5 requests were made, it would start throwing errors. I think it only works, because very few people know that subdomain.

Re: Ruqqus public dataset

Posted: Sat Oct 16, 2021 1:40 am
by antiliberalsociety
MadWorld wrote: Sat Oct 16, 2021 1:23 am
antiliberalsociety wrote: Sat Oct 16, 2021 1:18 am Great work! They're trying to shoah their seedy past.
Their site is barely functional at this point. You could try to access its https://api.ruqqus.com. But as soon as 3 to 5 requests were made, it would start throwing errors. I think it only works, because very few people know that subdomain.
The irony
JoeMcCarthy · 3 days ago · Edited 3 days ago
They can always go to your site and get censorship. So there's that. But the question is: why would they want to?

You're also a suspicious character quite frankly. You spent God knows how much time at Discussions digging through months of Faust Alexander's posts with the specific intent of smearing him. You put the kind of effort into it that indicates you have an awful lot of time on your hands. Or you are paid to do this kind of stuff. Either way - not a good look. I mean, I never paid all that much attention to you. You get a lot of attention as it is - which you obviously enjoy. But that move got my attention. Because of what it indicates about you. And I don't recommend joining sites run by possible glowie types.
https://api.ruqqus.com/+Ruqqus/post/doe ... oing/15p4w

Re: Ruqqus public dataset

Posted: Fri Nov 05, 2021 6:36 pm
by MadWorld
Last update on stats of ruqqus.

Record on ruqqus's dataset via api

submission count: 500619 (at least 50K entries not shown via api)
comment count: 1636417
submission count by guild: https://files.catbox.moe/xi5qxt.txt
comment count by guild: https://files.catbox.moe/w3j6f7.txt
i.ruqqus.com: 573569 (100GB+)

Note on the dataset by api. At least one guild (+general submissions) was not available on api, but was visible by web page. It was supposedly in unfiltered setting; yet, some data remained hidden. As @antiliberalsociety has noticed, they appeared to have already unplugged the api.ruqqus.com subdomain. Surprisingly, some entries from HitlerWasRight showed up on api.

541 HitlerWasRight submissions.
4306 HitlerWasRight comments.

The only things still work right now are its static home page and media on i.ruqqus.com subdomain. I can make the json data available. But the data on i.ruqqus.com subdomain is probably not worth uploading.

Edit: fixed links.

Re: Ruqqus public dataset

Posted: Fri Nov 05, 2021 10:40 pm
by antiliberalsociety
MadWorld wrote: Fri Nov 05, 2021 6:36 pm Last update on stats of ruqqus.

Record on ruqqus's dataset via api

submission count: 500619 (at least 50K entries not shown via api)
comment count: 1636417
submission count by guild: https://files.catbox.moe/w3j6f7.txt
comment count by guild: https://files.catbox.moe/xi5qxt.txt
i.ruqqus.com: 573569 (100GB+)

Note on the dataset by api. At least one guild (+general submissions) was not available on api, but was visible by web page. It was supposedly in unfiltered setting; yet, some data remained hidden. As @antiliberalsociety has noticed, they appeared to have already unplugged the api.ruqqus.com subdomain. Surprisingly, some entries from HitlerWasRight showed up on api.

541 HitlerWasRight submissions.
4306 HitlerWasRight comments.

The only things still work right now are its static home page and media on i.ruqqus.com subdomain. I can make the json data available. But the data on i.ruqqus.com subdomain is probably not worth uploading.
They really created a monster

Image

Re: Ruqqus public dataset

Posted: Fri Nov 05, 2021 10:50 pm
by SearchVoat

Re: Ruqqus public dataset

Posted: Fri Nov 05, 2021 11:22 pm
by MadWorld
antiliberalsociety wrote: Fri Nov 05, 2021 10:40 pm A HitlerWasRight Production
:lol: :lol: :lol:

Re: Ruqqus public dataset

Posted: Fri Nov 05, 2021 11:23 pm
by MadWorld
o7 Thank you for doing this!!!

Re: Ruqqus public dataset

Posted: Sat Nov 06, 2021 1:12 am
by antiliberalsociety
I LOVE YOU!!

No homo

If you could get the comments to go with the posts...

Image