Page 3 of 4

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 1:17 am
by CrystalVulpine
MadWorld wrote: Mon Nov 08, 2021 9:29 pm file: comments.fx.7z 6.2 GB compressed to 275 MB.

Comments with duplicates removed, sorted by base36 comment id. The api data included quite a bit of info, such as user's stat and guild's setting. You could create a template out of ruqqus's static page and plug in the info available. :lol: It would be hilarious to see a near-identical page view on SearchVoat page.

You could even use "searchvoat.co/ruqqus/[original url without domain name]" to view SearchVoat's version of data.

Thank you, @SearchVoat!! We love you!! (also no homo :lol: )

Edit: id of parent comment is also available for level value greater than 1.
Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 5:54 am
by antiliberalsociety
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am
MadWorld wrote: Mon Nov 08, 2021 9:29 pm file: comments.fx.7z 6.2 GB compressed to 275 MB.

Comments with duplicates removed, sorted by base36 comment id. The api data included quite a bit of info, such as user's stat and guild's setting. You could create a template out of ruqqus's static page and plug in the info available. :lol: It would be hilarious to see a near-identical page view on SearchVoat page.

You could even use "searchvoat.co/ruqqus/[original url without domain name]" to view SearchVoat's version of data.

Thank you, @SearchVoat!! We love you!! (also no homo :lol: )

Edit: id of parent comment is also available for level value greater than 1.
Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
Didn't you help code Ruqqus?

And greetings from JosephGoebbels! 😃

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 5:42 pm
by CrystalVulpine
antiliberalsociety wrote: Mon Dec 13, 2021 5:54 am
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am
MadWorld wrote: Mon Nov 08, 2021 9:29 pm file: comments.fx.7z 6.2 GB compressed to 275 MB.

Comments with duplicates removed, sorted by base36 comment id. The api data included quite a bit of info, such as user's stat and guild's setting. You could create a template out of ruqqus's static page and plug in the info available. :lol: It would be hilarious to see a near-identical page view on SearchVoat page.

You could even use "searchvoat.co/ruqqus/[original url without domain name]" to view SearchVoat's version of data.

Thank you, @SearchVoat!! We love you!! (also no homo :lol: )

Edit: id of parent comment is also available for level value greater than 1.
Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
Didn't you help code Ruqqus?

And greetings from JosephGoebbels! 😃
@antiliberalsociety yeah I did code some bits.

Greetings!

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 7:19 pm
by MadWorld
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am
MadWorld wrote: Mon Nov 08, 2021 9:29 pm file: comments.fx.7z 6.2 GB compressed to 275 MB.

Comments with duplicates removed, sorted by base36 comment id. The api data included quite a bit of info, such as user's stat and guild's setting. You could create a template out of ruqqus's static page and plug in the info available. :lol: It would be hilarious to see a near-identical page view on SearchVoat page.

You could even use "searchvoat.co/ruqqus/[original url without domain name]" to view SearchVoat's version of data.

Thank you, @SearchVoat!! We love you!! (also no homo :lol: )

Edit: id of parent comment is also available for level value greater than 1.
Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
The initial upload was in converted format without API access. Here is the remaining files fetched by API.

Submission file
submissions.f1.2021-10-30.txt.sort.2021-11-10.7z (79MB)

MISC files
guilds.f1.2021-10-14.txt.sort (9.2MB)
guilds.modlogs.f1.2021-10-14.txt.sort (4.1MB)
guilds.mods.f1.2021-10-14.txt.sort (3.3MB)

:lol: Have fun playing around!!

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 7:45 pm
by antiliberalsociety
CrystalVulpine wrote: Mon Dec 13, 2021 5:42 pm
antiliberalsociety wrote: Mon Dec 13, 2021 5:54 am
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am

Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
Didn't you help code Ruqqus?

And greetings from JosephGoebbels! 😃
@antiliberalsociety yeah I did code some bits.

Greetings!
Are you still in contact with carpathianfaggot or the other admins?

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 11:12 pm
by CrystalVulpine
MadWorld wrote: Mon Dec 13, 2021 7:19 pm
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am
MadWorld wrote: Mon Nov 08, 2021 9:29 pm file: comments.fx.7z 6.2 GB compressed to 275 MB.

Comments with duplicates removed, sorted by base36 comment id. The api data included quite a bit of info, such as user's stat and guild's setting. You could create a template out of ruqqus's static page and plug in the info available. :lol: It would be hilarious to see a near-identical page view on SearchVoat page.

You could even use "searchvoat.co/ruqqus/[original url without domain name]" to view SearchVoat's version of data.

Thank you, @SearchVoat!! We love you!! (also no homo :lol: )

Edit: id of parent comment is also available for level value greater than 1.
Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
The initial upload was in converted format without API access. Here is the remaining files fetched by API.

Submission file
submissions.f1.2021-10-30.txt.sort.2021-11-10.7z (79MB)

MISC files
guilds.f1.2021-10-14.txt.sort (9.2MB)
guilds.modlogs.f1.2021-10-14.txt.sort (4.1MB)
guilds.mods.f1.2021-10-14.txt.sort (3.3MB)

:lol: Have fun playing around!!
@MadWorld Thank you so much! I'll definitely use this.

So you got the initially uploaded version by crawling the HTML I assume? I guess you said the API didn't show everything, that's strange.

Re: Ruqqus public dataset

Posted: Mon Dec 13, 2021 11:17 pm
by CrystalVulpine
antiliberalsociety wrote: Mon Dec 13, 2021 7:45 pm
CrystalVulpine wrote: Mon Dec 13, 2021 5:42 pm
antiliberalsociety wrote: Mon Dec 13, 2021 5:54 am

Didn't you help code Ruqqus?

And greetings from JosephGoebbels! 😃
@antiliberalsociety yeah I did code some bits.

Greetings!
Are you still in contact with carpathianfaggot or the other admins?
@antiliberalsociety I've never been in contact with carp, he just bullied me a lot. He's on rdrama.net now which I don't use. captainmeta4 has me blocked on Discord, and I still have kek but he hasn't responded to anything lately. The only users I still talk to are sfrohne and Lukginzis.

I'm currently most active on saidit, my username is Vulptex.

Re: Ruqqus public dataset

Posted: Tue Dec 14, 2021 12:01 am
by MadWorld
CrystalVulpine wrote: Mon Dec 13, 2021 11:12 pm
MadWorld wrote: Mon Dec 13, 2021 7:19 pm
CrystalVulpine wrote: Mon Dec 13, 2021 1:17 am

Hey @MadWorld. Thank you for saving ruqqus's data.

Would you mind uploading the submission data in the original format like you did with the comments? This version omits some data and the values are changed around, and I'd like to have the original data straight from the API if possible.
The initial upload was in converted format without API access. Here is the remaining files fetched by API.

Submission file
submissions.f1.2021-10-30.txt.sort.2021-11-10.7z (79MB)

MISC files
guilds.f1.2021-10-14.txt.sort (9.2MB)
guilds.modlogs.f1.2021-10-14.txt.sort (4.1MB)
guilds.mods.f1.2021-10-14.txt.sort (3.3MB)

:lol: Have fun playing around!!
@MadWorld Thank you so much! I'll definitely use this.

So you got the initially uploaded version by crawling the HTML I assume? I guess you said the API didn't show everything, that's strange.
Yes, it was converted from html. It was started after @antiliberalsociety expressed his interest in preserving the data. The crawled version included pre-purge version of data.

After ruqqus went into "read-only" mode, there were inconsistency and large gabs between the pagination. The comments were fetched in parts to overlap the gabs. The submissions had fewer inconsistency in pagination. But at least one guild (+general) that I was aware of were not available on API. We can expect that the API data was incomplete, due to the changes admins had made.

Re: Ruqqus public dataset

Posted: Tue Dec 14, 2021 12:35 am
by antiliberalsociety
CrystalVulpine wrote: Mon Dec 13, 2021 11:17 pm
antiliberalsociety wrote: Mon Dec 13, 2021 7:45 pm
CrystalVulpine wrote: Mon Dec 13, 2021 5:42 pm

@antiliberalsociety yeah I did code some bits.

Greetings!
Are you still in contact with carpathianfaggot or the other admins?
@antiliberalsociety I've never been in contact with carp, he just bullied me a lot. He's on rdrama.net now which I don't use. captainmeta4 has me blocked on Discord, and I still have kek but he hasn't responded to anything lately. The only users I still talk to are sfrohne and Lukginzis.

I'm currently most active on saidit, my username is Vulptex.
Pitty, I was hoping someone could convey a message to them from the infamous JosephGoebbels. I wish to thank them for the opportunity to redpill more normies in 6 months than I did in 4 years on voat.co. The temporary free speech platform they provided allowed me to educate more people about the dark side of Jewish culture than any other.

The HitlerWasRight sub was so successful, it even got a mention on carp's little goodbye message on the main page. But it doesn't stop there, I plan to credit them completely in my next project. Ruqqus was the birthplace of HitlerWasRight, CommunismIsJewish, and several others, and it's grown into a new beast altogether. This was illustrated in my last project:

phpBB [video]


My next one will give full credit to them, without them - the HitlerWasRight movement might never have happened 😊 In addition, when they resorted to the inevitable censorship to stifle the truth as I warned people about, they unwittingly proved me, Hitler, and Goebbels right! That helped clinch anyone still on the fence.

If you happen to run into one of their crew, please do pass my message along! 😎

Re: Ruqqus public dataset

Posted: Tue Dec 14, 2021 1:09 am
by CrystalVulpine
MadWorld wrote: Tue Dec 14, 2021 12:01 am
CrystalVulpine wrote: Mon Dec 13, 2021 11:12 pm
MadWorld wrote: Mon Dec 13, 2021 7:19 pm

The initial upload was in converted format without API access. Here is the remaining files fetched by API.

Submission file
submissions.f1.2021-10-30.txt.sort.2021-11-10.7z (79MB)

MISC files
guilds.f1.2021-10-14.txt.sort (9.2MB)
guilds.modlogs.f1.2021-10-14.txt.sort (4.1MB)
guilds.mods.f1.2021-10-14.txt.sort (3.3MB)

:lol: Have fun playing around!!
@MadWorld Thank you so much! I'll definitely use this.

So you got the initially uploaded version by crawling the HTML I assume? I guess you said the API didn't show everything, that's strange.
Yes, it was converted from html. It was started after @antiliberalsociety expressed his interest in preserving the data. The crawled version included pre-purge version of data.

After ruqqus went into "read-only" mode, there were inconsistency and large gabs between the pagination. The comments were fetched in parts to overlap the gabs. The submissions had fewer inconsistency in pagination. But at least one guild (+general) that I was aware of were not available on API. We can expect that the API data was incomplete, due to the changes admins had made.
@MadWorld I have 13 posts in +general in the API version, so it must've been available. But you're right that it's missing some data, because I only have 498 posts in the API data whereas the HTML-crawled data includes 528, meaning the API excluded 30 of them. Maybe that's from posting in banned guilds, but I don't remember posting in banned guilds that often.

I have to wonder if they created the pagination gaps on purpose to destroy as much data as possible, since they also shut down way earlier than they said they would.

Update: Since each comment includes the original post within itself, most of the missing posts could be copied from there. So the only ones that have to be reconstructed from the HTML crawl are posts that were skipped by the API and have no comments (or were purged).

Update 2: I recovered 23 extra posts from the comments. 21 of them were in +general, the 2 that weren't were both in +FreeForum, and one of those 2 was originally in +HiddenWebGems but kicked to +general and yanked to +FreeForum. So it appears that +general was accessible from the API, but extremely spotty (and maybe +FreeForum too, but it's only 2 posts so it could be a coincidence). 7 posts are still missing, I'll run another comparison to see what those look like.

Update 3: Bad news. There were 13 extra posts in the HTML data instead of 7, meaning at least 6 extra posts were present in the API submission and comment data but not the HTML-crawled data. So unfortunately the HTML-crawled data is also incomplete. This means the API actually missed at least 36, giving me a total of at least 534 posts. But there could be a few more that are missing from all 3 sets of data.

Update 4: I noticed that precisely 6 of the 13 extra posts in the HTML data were in +general. I doubt that means anything though.

Update 5: This was a false alarm. If you take +general out of the equation, you get 503 (haha!) posts by me from the HTML crawl, and 485 from the API, 487 after including the posts embedded in comments. After adding the 7 in the HTML crawl, there are only 494.

This led me to recheck my data, and I was using an older file containing my posts from the HTML data that had any posts I had commented on added to it as well. To avoid grabbing other peoples' posts I put a condition in my script to only compare posts if I was the author; however, after the first 528 lines when other peoples' posts began, it also included several duplicates of some of my posts. I re-exported them, and only using the original 528, there were only 7 extra in the HTML data that weren't in the API data, as expected. So no, as of now the HTML crawl does not appear to have missed anything publicly accessible. 4 of the posts only available in the HTML data were in +general.