New DarkBERT AI was trained using dark web data from hackers and cybercriminals. Following the success of OpenAI’s ChatGPT, Microsoft’s Bing Chat, and Google Bard, researchers have created a new AI model with a much darker twist.
DarkBERT is different from its predecessors as it was exclusively trained on data from the dark web, sourced from hackers, cybercriminals, and scammers. South Korean researchers detailed in a paper how they crawled through the Tor network, filtering raw data to create a dark web database for DarkBERT’s training. Surprisingly, despite its unconventional training data, DarkBERT has already shown superior performance compared to other large language models.
Based on the RoBERTa architecture developed by Facebook researchers in 2019, DarkBERT leverages the robust optimization methods for pretraining natural language processing systems. RoBERTa improves upon Google’s BERT and produces state-of-the-art results on the General Language Understanding Evaluation benchmark. The South Korean researchers demonstrated that RoBERTa can achieve even more by utilizing dark web data to create DarkBERT.
Fortunately, DarkBERT is not intended for public release, but the researchers are accepting requests for academic purposes. However, DarkBERT’s existence will likely aid law enforcement and researchers in gaining a better understanding of the dark web. As with any AI chatbot, caution should be exercised to avoid malware infections and data breaches. Users should ensure they are accessing the official websites of popular AI chatbots and avoid clicking on links in suspicious emails or ads.
To enhance security while experimenting with AI chatbots, it is recommended to use reliable antivirus software for PCs, Macs, and smartphones. This will provide an extra layer of protection against malware that may be associated with AI chatbot links.
DarkBERT represents a potential future direction for AI models with specialized training in specific areas. Given its early success, it is likely that we will witness the development of similar AI models trained in unconventional ways in the future.