Worked Example- Twspider.py (Chapter 15)

دوره: Using Databases with Python / فصل: Basic Structured Query Language / درس 7

Worked Example- Twspider.py (Chapter 15)

توضیح مختصر

We're gonna use urllib and urllib.error, twurl which was code that augments my url to do all the OAuth calculation. We're gonna make a database, and we have to import SQL because of the way Python doesn't trust any certificates, no matter how good they are. Now remember that you've got to edit the hidden.py file to make this work because we are talking to the Twitter API.

  • زمان مطالعه 0 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

متن انگلیسی درس

Hello everybody and welcome to Python for Everybody. We are going to do some code walkthrough, and if you want to follow through with the code, you can download the sample code from Python for Everybody. And so, the code that we’re gonna play with is the Twitter spider code that is both talking to the Twitter API and talking to the database. And so, what we’re going to be doing is we’re going to run code that’s gonna hit the Twitter API, much like we did in a previous chapter, and we’re gonna retrieve the data, but we’re gonna remember the data, so we don’t have to retrieve it again. Okay, and so we’re going to keep track of people’s friends and what we’re doing here is sort of illicitly pulling down, slowly but surely, based subject to our rate limit, we’re pulling down who our friends are. And so, let’s take a look. We’re gonna use urllib and urllib.error, twurl which was code that augments my url to do all the OAuth calculation. We’re gonna get JSON data back. We’re gonna make a database, and we have to import SQL because of the way Python doesn’t trust any certificates, no matter how good they are. So this is our URL to talk to the Twitter API. We’re going to make a database. And again, the way SQLite works is if this spider.sqlite doesn’t exist, it creates it. And we get ourself a cursor. And we’re gonna do a CREATE TABLE. This IF NOT EXISTS, some SQLs, but SQLite3 does this, CREATE TABLE if it doesn’t exist. We want to start this over and over. Unlike the tracks example, I want to start this over and over and not lose data. And this is a spidering process, and we’ll see a lot of these. We want a restartable process where we use a database. So we’re starting with nothing and there’s no file, a spider.sqlite. It creates this table, and it’s the name of the person, whether we retrieved it or not, and how many friends this person has, that we know of in our database. Now this little bit is to deal with the SSL certificate errors. These certificates are totally fine, but Python doesn’t trust any certificates by default, which is frustrating, but whatever. So here we’re gonna have a loop. We’re gonna ask for a Twitter account. We have to type quit to quit. If we hit enter in this case, we’re going to actually read from the database an unretrieved Twitter person and then grab all that person’s friends. Okay. And so then, we’re going to do a “fetch one get one”, and that’s going to get the name of the first person, the sub zero. If we had more things than name here, sub zero was the first of those. Fetch one means get one row from the database, and sub zero means the first column of that first row. And if this fails, then we’ve retrieved all the Twitter accounts. And so, you know, we’re gonna augment this Twitter URL using this mix. You can look at the twurl.py code. This basically requires the hidden.py file, which has your keys and secrets in it. You got to get hidden.py updated. I’ve got it updated, but I’m not gonna show you because it has my keys and secrets in it. And so, we’re only gonna take the first five, which means we’re probably not gonna find friends of friends of friends, it’s only, at most, five recent ones. We could run this with a much higher number to get to, so we have more than one friend. We’ll show the URL while we retrieve it. We will do our urlopen. We’ll do a read, and then we’ll do a decode to make sure that this UTF – this will give us data in UTF-8, and then decode will give us data in Unicode which is what we need inside of Python. We will ask for the headers from the connection. We’ll say give me the headers, give me a dictionary of the headers. And the x-rate-limiting header from the Twitter API tells us when we’re going to be told we can’t use this API anymore because this is one of those things. And then we’re gonna parse and load the data that we got from Twitter and get a, I think it will, I think it’s a list. Yeah, it’s a list. And then, we could dump this, if you want. In here is you can undo that. And then what we’re going to do is we’ve just retrieved this person’s screen name and their friends. And so the first thing we want to do is update the database and change the retrieved from zero to one, and that’s because we want, we’re gonna use this to know about unretrieved. So retrieved being one means we’ve already retrieved it, and we did retrieve it. So for that account, we’ve retrieved it. And then what we’re going to do is we’re going to parse that. And so this is similar to the Twitter code we did previously in the web services chapter. We’re gonna go through all the users, we’re gonna find their screen name, we’re gonna print the screen name out. OK? And then what we’re gonna do is see if, let’s see, so we’re going through all the users who are the friends of this person, and we’re gonna say oh OK, let’s select the friends from Twitter where the name is the friend person. And what we’re gonna do is we’re going to do a cur.fetchone of this Twitter, the name of the friends, this is the friend’s screen name, right? So, we’re going to say oh OK, if we get this, we’re gonna get that friend’s screen name, and we’re gonna get how many friends this particular screen name has. If we find a URL and we find it in there, we’re gonna do an UPDATE statement and add one to their friend count, how many friends they have. And then, keep track – this count here is not in the database, it’s just so I can print it out at the end. If there is no record for this particular friend, we’re gonna insert them into it new, and we’re gonna say here’s the new person that we just saw. Here, that’s their name. We’re gonna set retrieved to zero, and we’re gonna say that they have one friend, OK? And then we’re gonna commit the transaction, and then we’re gonna close this at the end. OK? So, let’s go ahead and run this. The first time, it’s gonna create an empty database. So, I want to say python3 twspider. So, ls sqlite, nothing there; python3, oops, that’s ‘cause I removed it, python3 twspider.py. OK. So I’mma start with a Twitter account, Dr. Chuck. So, it’s doing this retrieval, and don’t worry, showing the token and the signature is not dangerous because you don’t have the keys or the token, I mean the secrets and the token secrets, so don’t get all too worried. So, I have 11 calls left, so I got to hope this all works. One of my friends is Stephanie Teasley and these are in reverse order. So let’s grab Stephanie and ask for Stephanie’s friends. So now we just retrieved Stephanie’s friends, and here are Stephanie’s most recent friends. And then, I can just hit enter, it’ll randomly pick. Now let’s see if I can in the database. Let’s open this up. File, Open Database. Hope I don’t lock myself. Sometimes, it’s a little scary when you look at the database, and you’re just checking. So this is what my database looks like. We retrieved Stephanie, and she has, this is how many people. So these are the friends of Stephanie and me, and these are how many – I’m not in there. So we’ve retrieved Stephanie, which was a friend. So let’s go grab – oh, I don’t know – let’s grab Tim McKay. And get that one. Remaining 10, I don’t have too many of these. Tim McKay, right. So, there we go. Remaining nine. And so if I do a refresh on this, then you see I’ve got some more folks. If I hit enter here, it will retrieve, it’ll pick one randomly based on the retrieved being zero. So it won’t pick Stephanie or Tim because they’re zero, but we have lots of other folks to pick randomly, and we’ll hit enter. So it picked – who did it pick? It picked screen name liveedutv, which is ironic because I am recording this on LiveEdu.tv right now. And so, we can keep hitting refresh, and away we go. So I’m gonna stop now because I only have eight remaining. And so I’m gonna type quit. And so we will see how that works, so that’s how it works. Now remember that you’ve got to edit the hidden.py file to make this work because we are talking to the Twitter API. If you don’t edit that file, it won’t work for you, ok? So I hope you find this useful. Cheers.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.