4.9 • 696 Ratings
🗓️ 3 March 2025
⏱️ 7 minutes
🧾️ Download transcript
Click on a timestamp to play from that location
0:00.0 | Hello and welcome to the Monday, March 3, 2025 edition of the Sands and at Stormsiders Stormcast. |
0:08.1 | My name is Johannes Ulrich, and today I'm recording from Baltimore, Maryland. |
0:13.1 | Well, let's start today with some stories about AI training data. |
0:17.8 | And the first one here comes from Truffle Security. |
0:21.0 | Truffle Security, of course, is the company behind Traffle Hawk, |
0:24.0 | the very frequently used well-respected tool allows you to identify API keys, |
0:30.7 | and other secrets that you may leak in Git or other repositories and such. |
0:37.3 | So Truffle Security took a big database, in Git or other repositories and such. |
0:37.6 | So travel security took a big database of AI training data that's being offered by Common Crawl. |
0:48.3 | Common Crawl is going out and sputtering the web for many years now to have something like 400 terabytes of data that they're |
0:57.3 | offering. And, well, it shouldn't really be a surprise because it's the same thing that we had with |
1:02.0 | Google and other web spiders that offer them the data publicly, that it now becomes, well, |
1:08.6 | rather be straightforward to find things like API keys that people leaked on their websites. |
1:15.6 | A little bit tricky here that this data is also historic data. I believe they're doing this for the last 10 years or such. |
1:21.6 | So it is not just current data. Now, sites like Google, they offer some historic data, but usually focus more on current data. |
1:31.3 | They found, again, 12,000 what the travel security considers life keys, which means that they work according to |
1:41.3 | Truffle Hawk. Traffle Hawk has a little sort of test feature that allows you to make sure that these |
1:47.6 | are not just simple sample or expired credential that being used here. |
1:51.9 | They point out in their paper that this number of roughly 12,000 secrets is, of course, |
1:58.8 | just an estimate. |
1:59.9 | There are some that they missed just because they were |
2:02.2 | formatted not correctly. And then of course, always a little bit tricky to figure out if they're |
... |
Please login to see the full transcript.
Disclaimer: The podcast and artwork embedded on this page are from SANS ISC Handlers, and are the property of its owner and not affiliated with or endorsed by Tapesearch.
Generated transcripts are the property of SANS ISC Handlers and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.
Copyright © Tapesearch 2025.