Lemmy should find a way to allow people to use the site while accessing it from Tor Browser.
If there is a risk of people using Tor Browser to post malicious files or links then people using Tor should be able to post on Lemmy but without images or links.
Based on the current state of some countries and attacks of certain minorities, including Latinos and Trans people in the USA, it isn’t safe to post certain views on Lemmy when the Tor Browser is not allowed, including not being allowed for registration.
Even with a VPN and even with browser that resists fingerprinting, people make mistakes without Tor Browser. Data centers sell information about the timing of packets (netflows) to multiple parties and VPNs offer less protection than people believe. This information can be correlated with netflows from ISPs. All of this information can potentially be correlated with real identities without Tor Browser.
In the USA, the current government is increasingly adopting radical tactics to cater to the religious fundamentalist poor and the ultra-wealthy. Without Tor Browser, it is not safe to be critical of these policies on lemmy.


Much of AI training happens on the entire public web, including here.
On the rest of the internet, your posts might get caught in the net, but it’s not as sure as when it’s written in the terms of service. In fact, smaller instances like lemmy.zip tend to take action against crawling, as they can’t afford the load. Of course, your posts could be crawled via any other instance, but again, it’s less of a certainty here.
No need to crawl. Set up an instance and let ActivityPub do the crawling for you.
The core that lemmy is based on has all the functionality of a crawler built right into it.
I remember reading a post somewhere on the threadiverse that contained a list of all websites one AI crawler was using for training, there were several Lemmy instances on it. It would even be possible to set up an instance only to crawl it.
Privacy is like security in that it’s impossible to make it 100% bulletproof. It’s all about minimizing your risk. I deliberately used the word “actively” in my OP. I don’t think most crawlers and ad companies even care to do all that, especially when there is so much data available to them through all the big platforms.