Worked Example- Geodata (Chapter 16)

دوره: Using Databases with Python / فصل: Databases and Visualization / درس 3

Worked Example- Geodata (Chapter 16)

توضیح مختصر

So this data is read in by this program, geoload.py, and if you recall, this Google geodata has rate limits, it also has API keys, which we'll talk about in a bit, too. And then we're going to simply insert this new data that we just put in, and then we're going to commit it, and every tenth one, this is count mod 10, we're going to pause for five seconds, and we can hit control C here, and then we're going to do the geodump, okay? We're going to go grab the zero item in that array, and then we're going to go find geometry, and then location, and then lat and lng, for the latitude and longitude.

  • زمان مطالعه 13 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

Hello everybody, welcome to Python For Everybody. This is another worked code example, you can download the Sample Code ZIP file if you want to follow along. And the code that we’re working on today is what I call the geodata code, and that is code that is going to poll some locations from this file. We’re simulating, or using, the Google Places API to look places up and so we can visualize them on a map, and so this is the basic picture. If we take a look at this where.data file, it’s just a flat file that has a list of organizations, and this actually was pulled from one of my MOOC surveys, we just let people type in where they went to school, and this is just a sample of them. So this data is read in by this program, geoload.py, and if you recall, this Google geodata has rate limits, it also has API keys, which we’ll talk about in a bit, too. And so the idea is this is a restartable, spider-like process, and so we want to be able to run this, and have it blow up, and run it, and start it, and not lose what we’ve got, right? And so we’re now using a database, as well as an API, but in order to work around the rate limits of this API, we’re going to use the database with a restartable process, and then we’ll make some sense of this, and then we’ll visualize this. But in the short term, let’s start with geoload.py code, geoload.py, take a look here. So, a lot of this hopefully by now is somewhat familiar to you, urllib, json, sqlite, and so I mentioned that the Google APIs, these used to be free, and did not require a API key, but increasingly they’re making you do API keys for, especially, new ones. And so what happens, you can go to your Google Places, go to Google APIs, and get an API key, and you can put it in here and it will be this big long thing that looks like that. And then, if you have an API key, you can use the places API, and I’ve got a copy of a subset, not all of it, a subset of it, here at this URL. As a matter of fact, you can just go to this URL in a browser, and it will tell you a list of the data that it knows about, okay? And I made it so that that does the same basic protocol with the address equals as the Google Places API. So this will just change how we retrieve the data, either retrieve it from MyServer, the nice thing about MyServer, it’s got no rate limit, it’s really fast, and you’re not fighting with Google all the time, and it means that, perhaps, if you’re in a country that Google is not well-supported, you can use my API. I mean, that’s really strange, that somehow my API is more reliable and available than the Google one, but it’s true. So we’re going to make a database, we’re going to do a CREATE TABLE IF NOT EXISTS, and we’ll have some address, and we’re really just caching the geographical data, we’re going to cache the json. One of the things we do when we build these processes is we tend to simplify these things, and not do all the calculation in parsing the json, just load it, and get it in, and load it, and get it in, and fill the data up in this database, so that’s what we’re going to do. Because Python doesn’t ship with any legitimate certificates, we have to sort of ignore certificate errors. We’re going to open the file, and we’re going to loop through it, and pull out the address from the file, and we’re going to select from the geodata where that address is the address. Let’s move this in a bit, and so we’re going to do a select and pull out that address, and the idea is if it’s already in the database, we don’t want to do it, so we do a fetchone, and pull up that first thing, which that will be the json right there. If we get that, we’ll continue up, otherwise we’ll keep going, pass just means don’t blow up, so we except, and we just do a pass, that’s like a no-op. And we’re going to make a dictionary, because that’s what we do for the key value pairs, everything you’ve seen so far have used constants here, but because we may or may not have an api key query equals, and then that’s the address, and then the key equals, and then the api key. If you recall, urlencode adds the pluses, and question marks, and all that nice stuff. We’re going to retrieve it, we’re going to read it, and decode it, print out how much data we’ve got, and add a count, and then we’re going to try to parse that json data and print it if something goes wrong. And as we’ve seen, at this top level of this json data from this geocoding API is an object, which we’ll see a little bit of it in a bit. And it has a status field in it, and the status is OK if things went well. So if the status is not there, that means our JavaScript is not well formed, or not how we expected. If the status is not OK, or not equal to ZERO RESULTS, then print out Failure To Retrieve, and then quit. And then we’re going to simply insert this new data that we just put in, and then we’re going to commit it, and every tenth one, this is count mod 10, we’re going to pause for five seconds, and we can hit control C here, and then we’re going to do the geodump, okay? So let’s just run this, geodata, python, so let’s do an ls, so we don’t have, we do have, let’s get rid of, from a previous test, geodata.sqlite, so we’ll start with a fresh set of data, and run python geoload.py. Of course, I’m always forever making the mistake of forgetting python3, so you can see that it’s running. And it’s adding the query, and in this case, I don’t have the API key, and it’s putting the pluses in, and that’s, this part here with all the pluses, that’s the URL in code, and you notice that it’s pausing a bit. Now, it depends on how fast your net connection, this may or may not go so fast, but this is not that much data, so it should, it’s only like 2,000, 3,000 characters, and so it’s working and talking to my server. And the interesting thing here is I can blow this up, I’m going to hit control C, in Windows, you would hit control, in Linux, you’d hit control C, and in Windows, I think, you’d hit control Z, depending on what Shell you’re working in, but I’m going to hit control C, and you see I sort of blew it up, right? And that causes a trace back, a KeyboardInterrupt trace back, if I do an ls minus l, you can see that now this geodata is there. Now, in the name of restarting, I will restart this, and you will see that it checks and skips, and so all, it runs this code here where it’s right here, it grabs it and finds it in the database. So you’ll see it say, Found in the database, really quick, chop, chop, chop, and it’ll go really fast, and then it’ll go back to catching up where it left off. So all those up there, they did not actually re-retrieve it, because it knew about those things, so now its catching up and doing some more, and doing some more, and doing some more, and then I’ll hit control C. It has a little counter in here that basically, if it hits 200, it stops, and you have to restart it, you could obviously change this code, you could make it so it didn’t sleep, doesn’t hurt to sleep for like a second after every 100 or so, if you want, you could change that code. And now, let’s just hit control C, and blow it up, ls minus l, and there is another bit of code, and this code, it’s always good to write these really simple things, and so now we’re going to import sqlite and json. We’re going to connect ourselves up, we’re going to open, except this is a utf-8, because it’s a, we’re going to open this with utf-8, and we’re going to read through, and in this case, we’re going to decode, we just SELECT star FROM Locations. And if you recall, Locations has a location and a geodata, and so the sub zero will be the location and the sub one will be the geodata, and we’re going to parse it, convert it to a string and then parse it. If something goes wrong with the json, we’ll just keep skipping it, we’ll check to see if we have the status in our json. Let me run the sqlite browser here. File, Open Database, let’s take a look at what’s in this database. Where are we, code3, geodata, geodata.sqlite, so this is our data we’ve got, let me make this a little bigger if I can, it’s not going to show as much. So you can see that these are the addresses in the geodata that’s just the json, so that’s the json that we’ve got, and it retrieves it. And so this is a really simple database, it’s just a sort of spidering process, run, run, run, but now, we’re going to run the geodump code, which is going to read this and dump this stuff out and print where.js. So it’s going to actually parse this stuff, and that’s code we’ve seen before, and so we’re actually reading it, and this line goes under the results, results is an array, so if we go into results, results is an array. We’re going to go grab the zero item in that array, and then we’re going to go find geometry, and then location, and then lat and lng, for the latitude and longitude. And then we’re also going to take the actual address out of the formatted address right here. So in this bit of code, we’re actually parsing the json, and we’re going to clean things up, get rid of some single quotes, this’ll kind of data cleaning is just stuff after you play with it for a while, you realize, my data’s ugly, or it does this. I’m going to print it out and I’m going to write this out, and I’m going to write it into a JavaScript file. And so, the JavaScript file is where.js, and I’ll show you what it looks like. It’s going to be overwritten, this is the one that came out of the ZIP file, it’ll have the latitude and the longitude, and we’re going to use JavaScript to read this in this where.html file. It’s going to actually read this right there, and pull that data in, and that’s how we’re going to visualize it. I’m not going to go into great detail on how the visualization happens, but that’s what’s happening. And so we’re going to write that, so we’re going to actually write this to a file, so let’s go ahead and run this code, and say python3 geodump. Okay, so it wrote 120 records to where.js, so if we look at where.js, this is now the new data that I just downloaded moments ago, and it says, Open where.html in a browser. Now this one, you’ll need the Google Maps API, and you might not be able to see this depending on where you’re at, but here you go with Google Maps locations. And I think if you hover over this, you can see, and you see the utf, why we, there in that particular thing, why we had to use the utf-8 when we wrote the files, so that we didn’t end up with trouble writing the file out. And so there you go, so that is a simple visualization, and just a simple visualization, it wrote this where.js. If you are smart with HTML and JavaScript, you can look at this where.html file, it’s really just reading through a bunch of data and putting the points. That’s all there is, but I’m not going to go through that, so, at least not in this, and so, I hope that this was useful to you, and thanks for watching.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.