Spidering and Modeling Email Data

دوره Capstone- Retrieving, Processing, and Visualizing Data with Python ، فصل 4 : Spidering and Modeling Email Data

درباره‌ی این فصل:

In our second required assignment, we will retrieve and process email data from the Sakai open source project. Video lectures will walk you through the process of retrieving, cleaning up, and modeling the data.

این شامل 5 زیر است:

Now actually if you look at the readme on gmain.zip, it tells you how you can get a head start by doing this first 675 megabytes by one statement, and then you can sort of fill in the details. But this is almost ten years of data from the SAKAI developer list and people even changed their email address. If you want to actually make sense of this data, we clean it up by running a process that reads this completely, wipes this out and then writes a whole brand new content.

The one that I have a nice copy of all this data's on a server that's accessible worldwide and won't crash. All now we're going to complete we're going to quit and if you can't parse it then we're going to tolerate five bad email addresses in a row. So if we take a look at the database, and we go into the gmain, any time you see the content SQLite journal that means it needed to run a COMMIT.

The work that we're doing right now is we are in the process of building a spider and visual- visualization tool for email data that came originally from this website gmane. So remember I split the body into- in half and then the headers and so that's- I made this as raw as I possibly could because, as you saw, I had to spend so much time in the gmane just putting the data successfully retrieved. And so, this is just sort of like a- I pulled this in really quick, and I read all this stuff from the DNS mapping and I- other than stripping and making this lower case etc.

We're here in Baltimore, Maryland with yet another installment of office hours for Internet History, Technology, and Security as well as Programming for Everybody. Hi, I'm Wusi, taking Introduction to Python. I have no idea of programming but I hope I'll get something out of this class.

One of the things we've learned from the Snowden documents is that cryptography, broadly applied, gives the NSA trouble at least at scale. Now what cryptography does is that it forces the attacker, whether the NSA, or the Chinese government, or cyber criminals, or whoever, to have a priorities list. You know, recently we learned about vulnerabilities in the key agreement protocols that are used to secure a lot of the VPNs and Internet connections, right?

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.