13.8 - Securing API Requests

دوره: Using Python to Access Web Data / فصل: JSON and the REST Architecture (Chapter 13) / درس 8

13.8 - Securing API Requests

توضیح مختصر

Google has servers and buildings, and staff, and researchers, they wrote the code in the first place, they accumulated all the data, they did all the searching. This dumps takes this JSON structure and then prints that out in a pretty way, with an indent, with curly braces and stuff like that. To formats like XML and JSON, to the notion of applications that begin to be distributed running on different computers connected via network.

  • زمان مطالعه 11 دقیقه
  • سطح سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

So one of the things about that Google API that we just got done talking about is that is an amazing amount of free resource for us. Google has servers and buildings, and staff, and researchers, they wrote the code in the first place, they accumulated all the data, they did all the searching. They tested the data. They cleaned up the data and we just sit there and write 40 lines of JSON and we’re taking advantage of it. There’s no such thing as a free lunch. The data behind these APIs is often very valuable. So there is ways that these organizations either want to monetize this, or limit their exposure. Sometimes Google is trying to do this for the greater good to make it so there are more location-aware applications. But frankly if you are going to build a location-aware application that has a billion users they probably want a cut off your deal, right? And so they tend to build these things and the rules kind of change. If you look at Google this may or may not the current but that particular API we were doing limits you to 2,500 requests per day. And later I will show you some code that I wrote if you have to do geocoding on more than 2,500 things you got to break it over a couple of days. And so I’ve shown in a later application geojson, there’s a whole zip file in the code section, that shows you how to start this and stop this so you can get like 2,000 and then wait 24 hours and get 2,000 more and wait 24 hours. So if you’ve got 10000 to do, well, after a week, you’ll have it. It automatically starts itself off and stops itself. And it uses a database to keep track of where it’s at. Or you can pay Google if you want, and they’ve got rates for that and on and on and on. So it’s not as simple as wow this is free. In this case as long as you don’t use it too much this is free, that’s why it’s a good starting point to show you. Now another API that is an API that you might find interesting. Facebook has APIs. Twitter has APIs. I’ll talk a little bit about the Twitter API. The thing about Twitter API that’s different is the Google API said from your computer you can only get 2,500 requests. But for the Twitter API, you get zero requests unless you are authorized. And so this is a situation where there is a required authorization. And it’s kind of like what you do on your browser or on your phone, where you log in first and then you can use this application as much as you want. In this case, you can log in and then use this API. Not necessarily always as much as you want, but authorization and authentication is part of some APIs. And so later I’ll give you a much more detailed walk through on how to do this. But for now I just want to give you a high-level overview of what we’re going to do here. And so there is a URL that you can go to for Twitter that basically creates keys and secrets, kind of like IDs and passwords except for applications. And so once you know how to authorize your URLs you have a documentation for the API on Twitter that you can use. And so here is like the rules for the Twitter, the Tweets API. How you can go find the API. And in this it’s telling you even though you might not be able to read it. You have to hit this URL and you can do this and you get a timeline, a user timeline, a favorites timeline, a retweet timeline setter. So there are URL patterns that you can hit to get different kinds of data from the Twitter API. So here’s a bit of code and I’m not going to go through it in tremendous detail. I’ll just call out some of the highlights, because I’m going to have a separate video where I just go through this and actually run the code. But let me just give you sort of a summary of this. So we are going to be using urllib. Twitter URL is a bit of code that I wrote that augments these URLs and deals with that authorization problem. Of course we need JSON, and so we read what the Twitter URL was from the Twitter documentation. It says that’s where you go to get a friends list. And so again, we’re going to have a while loop that’s going to ask for a Twitter account. Again if you hit enter we’re going to skip out. And this augmenting of the Twitter url this is basically doing a whole bunch of stuff that is both putting on some things like screen name equals and count equals and doing all the url encoding. But it’s also putting on the security stuff and we’ll touch on that in a second. So this is I wrote this. That’s twurl.py, which we’ll show you in a second. And then you hit this URL. It includes login information in this URL. Then we do our urlopen. Then that gives us just sort of the connection. But then we do connection.read and as always because it’s coming out from the outside world, it’s probably UTF-8. We have to do a decode to bring it in. But now inside data we’ve got a string which represents the curly braces and the square braces, and the commas and all that stuff, right? It’s not the JSON. It is the string representation of the JSON. And so that’s what that does, and this next line is kind of interesting. Remember when I told you that urllib, this is a long time ago. I told you that urllib eats the headers, you don’t get the headers, but on all these request/response cycles that we’ve been doing, we’ve been getting headers all along. So it turns out, that this is the line of code that you use to get the headers. And so the data comes in. You do connection.read, that gets you the body, and then if you say connection.getheaders, that is a method inside connection. It had the headers all along and it kept them. And you said give me these headers and give them back to me as a dictionary. Now if you look back at those headers, they’re a bunch easily represented as key-value pairs. And what’s going to happen here is Twitter is going to communicate to you how many more of these requests they are going to allow you to do before they shut you down, because it’s a thing per day. And the name of the header is x-rate-limit-remaining. And that gives you a number that’s like 12, 11, 10, 2, 1, 0 and if it’s like 0, you better stop, right? because they are basically saying they’ll tell you when you retrieve this how many more of these you can do and this particular one you only got like 15 a day or something like that. And so what we can do is we can print this out see so how many you’ve got, so that just prints out the number that’s left. Then we’re going to parse this and we’re going to also dump it. This dumps takes this JSON structure and then prints that out in a pretty way, with an indent, with curly braces and stuff like that. So that’s nice. And then, if we look at what the JSON looks like, which is down here. So it says we’ve got 14 of these requests remaining, it’s an object that has users, which is an array. And then each user is itself an object. And so, the outer one sub users, js sub users, is an array. So we’re going to have u that’s going to loop through each one of the users in this output. So there’s the output, right? So u is going to iterate through this one, then this one, then this one, then this one, on and on and on. And in there will be the screen name and their real name, etc, etc, etc., right? So we’re going to go through each user, we’re going to print out the screen name. And then we’re going to print within the status their text and the only first 50 characters of the status. And so if we run this program which I’ll do in another video. I’ll actually run this for you. You see the person’s screen name. You see their last status. You see the next person’s screen name. So this loop is going through all the loop users and printing that out, and so it’s not that hard. This little bit here with the Twitter URL is a little tricky. We’ll show you that in a sec. But just reading it isn’t all that crazy. It’s not all that crazy to read this data and parse it. So that’s what that one is. So when you are going to set up these keys, you’ve got to go into this web site. You’ve got to get these values. You get this token. You got this key. You got this secret. You got this token secret. And they’re just these big long crazy strings. And there is this file called hidden.py on the sample code. And you have got to cut and paste the four values that you get from the Twitter web site after you’ve logged in. It will give you these values and you’ll have to put them in, so hidden.py is part of twurl that makes this thing. So, you have to get these right, and if you don’t get these right, your Twitter code’s not going to work. And I’ll show you how to do that in a recorded video. It uses a protocol called OAuth, and basically what we’re doing is we’re signing URLs, and so it’s not just a URL. It’s a URL plus the signature on the URL. And it’s signed in a way that only you as long as you know the keys and the secrets can sign that stuff. And so this is the Twitter URL code that I’ve written. hidden.py is what you have to write. You put all those four strings in there, and that’s the consumer key, the consumer secret, the token key and the token secret. You set all that, and then I call this oauth library, that’s not your code, that’s actually part of Python that does this oauth. And when it’s all said and done, we do all these things, make a request and then we sign this request and then we convert it to a URL. When it’s all said and done, the url looks like this. These are the, this is the data we’re interested in. We’re looking for count of 2 and the screen name of drchuck and then all of this stuff that says oauth version equals oauth token, oauth timestamp, oauth signature. All of these stuff is the magic stuff that is part of that signed URL, okay? Now you don’t have to worry about that, because as long as you get hidden.py and you called twurl, you’ll get back this big long URL. And you hit that with urlopen, and Twitter will read these things and say good numbers, good numbers. Okay, here’s your data. And if you don’t know what this data, if you don’t have these secrets right, then you can’t see it. And it also is a way for Twitter to know if it’s you requesting the data or me requesting the data because somewhere in here is who I am. Not screen name equals Dr. Chuck, that’s the data we’re actually asking for. But the key, and the consumer key, and the consumer secret are the trick where it says, hey this is Chuck having logged in asking for API data. And so that is how Twitter basically knows how many you’ve done and how many I’ve done so you get 15 and I get 15. But once I run out of 15, I can’t use anymore. So we’ve gone a long way. We had we started with oh, urllib can get data to serialization and de-serialization. To formats like XML and JSON, to the notion of applications that begin to be distributed running on different computers connected via network. The service oriented architecture, cutpoints where we agree on APIs, contracts, etc, etc, etc. And so this is sort of just enough to get you started on Web services and service oriented architectures.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.