In the space of 1 week, a second open-source Chinese AI model equals the best investors are pouring tens of billions of dollars into.

Lugh@futurology.today · 29 days ago

In the space of 1 week, a second open-source Chinese AI model equals the best investors are pouring tens of billions of dollars into.

hedgehog@ttrpg.network · 26 days ago

Realistically, no LLM that’s large enough to be competitive will be able to remain open-source, even if it was initially (and most that claim to be weren’t actually, as you point out), because so much training data is needed.

Often the training data can’t be re-distributed in the first place, but even if it can be, its availability makes it much more likely that someone will request the takedown of some data in the set (even if the data was licensed, someone who holds copyright might claim that the person who submitted it to the set wasn’t permitted to do so). At that point, unless the takedown request is refused or the model itself is re-trained (which would be quite expensive) the data is no longer sufficient to generate the model.