Worked Example- Twfriends.py (Chapter 15)
دوره: Using Databases with Python / فصل: Many-to-Many Relationships in SQL / درس 5سرفصل های مهم
Worked Example- Twfriends.py (Chapter 15)
توضیح مختصر
And of course we're going to ask URL open to give us back the headers, as a dictionary, using this call and we can see what the, how many we have left for the remaining, right? And then what we're going to do is write a loop that goes through all the friends of this particular user that we're asking and get their screen name, print it out and then we're going to check to see if this one is already in our people database. So what I'm going to do is change this code a little bit really quick, and I'm going to print the headers of rate limiting at the beginning.
- زمان مطالعه 0 دقیقه
- سطح متوسط
دانلود اپلیکیشن «زوم»
فایل ویدیویی
برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.
ترجمهی درس
متن انگلیسی درس
Hello everybody, welcome to Python For Everybody.
We are doing some code walk-throughs.
If you want to follow along with the code you can download the source code from Python for Everybody, the Python for Everybody website, okay?
So the code we’re playing with today is twfriends.py. And this is a step beyond the simple TW spider it is a restart-able spider. But we’re going to data model things a little bit differently.
We’re going to have two tables, and we’re going to have a many to many relationship, except that it’s sort of a many to many relationship between the same table, which is okay. Twitter friends are a directional relationship. And so we start out here in twfriends.py.
Remember that the file hidden.py, I’ll show it to you but I’m not going to open it because I’ve got my keys and secrets in it.
So this hidden.py file, you’ve gotta edit that and you gotta go to apps.twitter.com and get your keys and put them in there otherwise these things won’t work.
But if you have Twitter, and you set your API keys up, and you put them in hidden.py, then all these things will work. It’s kind of fun, actually, and impressive, not hard to do, actually.
So, the Twitter url, that’s my library that reads hidden.py and augments the URL and does all the O off stuff, json and ssl, because Twitter doesn’t.
I mean, because Python doesn’t accept any certificates, even if they’re good certificates. So, we kind of crush that. Here’s our friends list that we’re going to hit. We’re going to make a database friends.sqlite.
Now, here we’re doing a crate table of not exists. So, what this really is saying is I want this to be a restart-able process and I don’t want to lose the data. We’re starting out, we do not have SQL Lite, any SQL Lite files.
And so this is going to create the database, and create these tables. But the second time we run it, we’re not going to recreate the tables.
We’re not going to, we’re going to be able to restart this, because we’re going to run out of, rate limit before we finish this, but so we just have to wait. However long rate limit takes to reset. And we’ll watch the rate limit go down.
And so we’re going to have a people table, and we’re going to have a primary key and the name.
The name is going to be unique, and whether or not we’ve retrieved it, and that’s kind of from a previous one, but then there’d the who follows who, the from id to to id. And so this is the direction.
And we’re going to put a uniqueness constraint in, just like we do in many to manys that basically says the combination of from id and to id is got to be unique.
We don’t allow ourselves to put duplicates of the combination, so from id can be one in many records and to id can be one in many records, but one, one is only allowed once. And this is the crud we have to do to convince Python to accept the Twitter certificate.
And so this is similar to some of the other stuff that we’ve done. We’re going to enter a Twitter account or quit. And if we hit enter by itself, then we will actually go and retrieve a record that was not yet retrieved.
And now we’re actually pulling out two values, id and name, and so we will grab fetchone. It’s going to give us a two topple basically and we’re going to store that in id in account of course that’s like, this is coming back with the two topple.
First of which is the id from the data base, LIMIT 1 means well I’m going to get one of these, or zero of these. If there is zero of these that means there are no unretrieved Twitter accounts.
Retrieved equals zero, well you’ll see in a second that all the new accounts we’ve put in are the ones for which we haven’t retrieved and again given that our rate limit we want to know which ones we’ve retrieved, okay?
And so what we’re going to do next is we’re going to check to see if the person that we just checked, which means the length of the account is greater than we just entered, we’re going to check to see if they’re already there, okay?
And we’re going to select ID from people where name equals, so that’s the one we just entered.
And we’re going to fetchone, and grab the first thing, because we only got one thing in the SELECT statement here, and if this person that we just asked to see is not in the table, that means this is going to fail.
We’re going to do an INSERT or IGNORE, this is IGNORE is kind of redundant, because we just checked to see if it’s there, but we’ll put that in just to be safe.
And we’re going to put the name in for, as the new account that we are looking at and we’re indicating that retrieved is zero, so that we will know that we haven’t retrieved it yet. You’ll see that we’ll update that in a second, we commit it so that later selects will see this.
So you gotta do the commit. This later select wouldn’t see the one we just inserted. And we’re going to ask how many rows were affected? And if it is not equal to one, then we’re going to complain about we inserted it and we are going to do this thing.
We are going to ask, hey, remember there was an ID up there? Right here, ID integer primary key and we did not insert this here, but we want to know what that ID is.
And every time I was showing you that in lectures, I was saying it’s really easy in Python to do this, and that’s worth saying.
This cursor did the insert, but one of the things that happens is after the insert, we’re going to grab the last row ID. Which is the primary key that was assigned by SQL, okay? And so that means that one way or another, coming through this code here in line 45.
One way or another, we’re either going to know the ID of the user that was there before or we just inserted one, and so we’re going to know the primary key of the current user. And you’ll see why we need that.
So ID is the primary key of the current user that we entered right here, okay? And now what we’re going to do is do the Twitter URL augment with O off and all the keys in the secrets and hidden.py. Instead we’re going to go through, let’s count 1000.
Let’s go count, what the heck, let’s go 200 up to 200 friends. Say, no let’s do 100, we’ll keep it that way. And then we’re going to retrieve it and we’re retrieving the account, we’re not going to print the nasty URL out, we could.
Then we’re going to open the URL, with the connection, and then we’re going to read that.
And then we’re going to get the UTF data from this and then we’re going to decode that and we’re going to have the Unicode data. So the data in string is a internal Python string, with all that data representing all the wonderful characters.
And of course we’re going to ask URL open to give us back the headers, as a dictionary, using this call and we can see what the, how many we have left for the remaining, right? What’s the remaining rate limit we have, okay?
And so then what we’re going to do is parse the data, with json.loads. If, wait I need a continue in here. Continue, okay, save. If we’re going to parse this data, we’ll print it out, right? So that means that this, this died. Which means it’s not syntactically correct json basically, and who knows if we’re ever going to see that but at least when it blows up it’ll print this data out.
We’ll have to catch it and then it’ll continue. Actually, I’ll make this a break because if that’s blown up that bad, we should quit. Now, we don’t, I don’t know what happens when this rate limit says you can’t have it.
But I do know that I expect when it’s successful that there will be a key of users in this outer dictionary that we’re going to get. And if this outer dictionary, if users is not in the parsed dictionary then I’m going to dump out this data. So that at least I can debug what happens when I’ve got some broken json.
So the difference between this code, this code is going to fail when the json is syntactically bad. Meaning a curly brace isn’t right or whatever, this code will trigger when I get good json, but I don’t have a user’s key in it, okay?
So then once we’ve retrieved it, we’re pretty happy with it, we’re going to update for our account that we are retrieving. We’re going to set this as one of our retrieved accounts, okay?
And then what we’re going to do is write a loop that goes through all the friends of this particular user that we’re asking and get their screen name, print it out and then we’re going to check to see if this one is already in our people database.
because this is a spider, we’re grabbing accounts and so we’ll do a friend id, and do a fetchone, grab the subzero thing and if that works. If this person’s not in there, this fetchone’s going to blow up which means we’re going to drop down to the except code.
But if it does work we have friend id is there and they’re already in our database, right? They just weren’t retrieved. Okay?
And so now if we, the friend id wasn’t there we’re going to do an insert into setting retrieve to zero and then we’re going to commit, right?
Now, remember, rowcount is how many rows were effected by this last transaction. Cur.rowcount and we’re going to die if that doesn’t, insert doesn’t work, this is unlikely, unless somehow we’ve run out of disk drive or something.
And we’re going to grab the friend id as the key, the last row that was inserted. We’re only going to insert one row, so it’s basically the primary key of the row that we just inserted.
So if you look at this code right here, it comes out the bottom one way or another with friend id successful. Right?
So friend id is either they’re already in our database or their not. And if we insert them then we have it.
And so now this countnew and countold is just so I can print out a nice print out. Now we’re going to insert into the friend table which is called the Follows table in this case from id and to id.
Those are the two outword pointing foreign keys. And we have the id of the account that we are retrieving the friends of, and then this particular friend.
And so we’re inserting the connection from this person to that person.
And then we commit it, we want to commit these again so that later selects when the loop goes back up, later selects get all of that data that’s going on okay, so we do want to commit from time to time and then we close the cursor at the very end.
Okay so let’s run this and see what happens. Okay, so python twfriends.py. Of course. I am a refugee from python2, so I always forget to type python3.
Okay, so we are going to start If we take a look right now, I going to start another tab over here, and ls minus l star sqLite, now that sqLite file is there right?
And it’s actually made the table, so if you go up here. It ran all this stuff, create the tables yada, yada and we’re sitting right here at this line.
As a matter of fact I think without causing to much trouble I can open that database and get into this database right here and there is no data in the follows table and there’s no data in the people table it’s completely empty.
Okay. So we’re waiting for the first one and I’ll go with mine Dr. Chuck. So it’s retrieving the 100 friends and they all were brand new. They were all inserted. Right? And so now if I hit refresh we will see that Dr. Chuck is retrieved.
Who follows? So these are all the people I follow. One follows two.
So if we look here we see that Dr. Chuck follows Stephanie Teasley, because we grabbed the followers at Dr. Chuck, you know, we’re going to have a record in all of the follows for all the ones that I did, right? So these are all the people I followed and we put them in. Okay?
So we can go back and we can let’s see grab somebody. Let’s go grab Stephanie Teasley.
And let’s pull out her friends. So we grabbed 100 of her folks. I got 14 left. That’s my x rate limit. So I did Stephanie Teasley, so let’s go back here.
So you’ll notice there’s 101. There’s probably going to be. 182. That’s interesting. So we’ve retrieved Dr. Chuck and Stephanie Teasley. And let’s go take a look in the friends table, the follows table, okay.
So we have all of the people I follow, now all the people Stephanie follows. Okay, so there we go, so let’s go ahead and do somebody else. Let’s see, I think we both follow Tim McKay.
Where’s Tim McKay? Yeah, let’s follow Tim Mckay. Let’s see who Tim follows, let’s see if we can get an overlap. We revisited some. Let’s see if we can see this in the follows.
See people. So we’ve got Dr. Chuck retrieved and Tim Mckay’s somewhere down here. It might take us a while before we get any really good overlaps. Let’s see. Let’s do a database call. Let’s see do a database SQL, And select, Count.
Okay, so let’s just run this some more. It’s clearly working. Now, one thing I can do here is I can hit enter and it will just pick one randomly. So it grabbed liveedutv. And now I can see how many I got left.
We’ve got 12 left, and now I can hit enter again and it picks another one. That was the next one, I was kind of picking them in order, is it picking them in order? Let’s go to people. Yeah, it’s picking these. So we can see that it’s going to just do the first unretrieved person, who’s Nancy. Let’s let it retrieve Nancy.
So it grabbed Nancy, new. So we’re finding some. And this table’s getting really big. And so if we look at the people table we now have 455 people and we have 467 following records. And so there we go. Hit enter, it does another one. And away we go. So you get the idea. I can type quit to finish.
And just to give you a, a little interesting, that a code to show you how to do selects, I’m going to do this TW joint, now you know, because we’re not talking, let’s show you one thing, ls minus l friends.
Start sqlite, so this database has it select and restart this process and run it again. And the database is still there, and so we just grab track. And so we can keep doing this. And so this data, it keeps extending. And so this is a restartable process. I can run it and then tell it to grab the next unretrieved one.
And so away we go, right? And so that’s part of it, so if I run out of my, I’ve got eight left. How many do I have left really? Let’s keep going. How many do I got left?
I got five left. Okay, wait. I guess we’ll just run it out. So I got four left. What I should do? I can’t change the code. Yes, I can change the code. I can stop the code and I can quit the code.
So what I’m going to do is change this code a little bit really quick, and I’m going to print the headers of rate limiting at the beginning.
And at the end. So now I can run it again, I change the code. Hopefully I didn’t make a Python error. Tell it to go get another one and a , and so I got three left!
We’ll see what happens when I run out a rate limit. Run out a rate limit. So we have one left, hit Enter, hit Ctrl + K, OpenSource.Org, so we have zero left, that worked, now let’s see what happens. I don’t know what happens next. We blew up, too many requests, we got a HTTP Error 429. So that means that going for Mark Cuban.
That was in line 48, so the right thing to do would be in line 48. We should really put this in a try accept block. Try accept block, because it gives us an error. Print, fiddle sticks.
How do I print the exception message? I always am forgetting print(‘Failed to Retrieve’). Okay, so we’ll put that in. Now, if I run it, , and then I have to put a break here because that’s not good.
Break. Failed to retrieve. Now I gotta figure out. See I never know how to print out the error message. Yeah, see that’s the weird thing about stuff is that I don’t ever remember enough.
I don’t remember the syntax what I say here to print the error message out. So I’m going to go to Google and I’m going to say, print out the exception message in Python. Print out the exception message in Python.
Python 3, hello. Okay, so let’s go find it here in the documentation. Is this it? Is this what I say? I just want to print out the message.
That’s it, except, let’s try this So this is part of Python programming, it’s like for me at least. because I’m just not a genius expert at this stuff. This is one thing I like about Python. As you can guess stuff and sometimes you guess right.
So there we go, we got the error, we got the nice little error message.
And we see error 429, too many requests.
So that cleans that up nicely.
So we have run out of requests and on that, it is a good time to say thanks for listening and I hope that you found this valuable.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.