16.2 - Geocoding Visualization

دوره: Using Databases with Python / فصل: Databases and Visualization / درس 2

16.2 - Geocoding Visualization

توضیح مختصر

I provided all these things so that when you run these programs that stuff is all there, but if you want to learn and see some examples of how to make a little simple java script visualization with a line or a word cloud or a map, we've got it and you can take a look at those things. But we need to avoid rate limiting, so we're going to cache this in a database meaning we're only going to retrieve data at once and then we're going to use the Google Maps API to visualize this in a browser. So up next we're going to show how we can use this to build a very simple search engine and then run the page rank algorithm.

  • زمان مطالعه 7 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

So now, in this last chapter, we’re gonna talk a little bit about visualizing data. But what we’re really doing is we’re sort of summing everything up because we’re gonna retrieve data from the network, we’re gonna process the data, we’re gonna store it in a database, we’re gonna then write it out and visualize it. So it’s all coming together. And it turns out that this notion of gathering data, the data gathering using the network, it’s a pretty common thing. We might take a cleaning or processing step where we’re, that part of the problem is is when you’re pulling data off the net, you want to be able to start this process because it will run and run and run and then your computer will crash or it will go to sleep or something. And so you don’t want to start from the beginning because it might be quite a bit of data and it takes awhile to do it. Or as we’ve seen in some, you might be talking to an API that’s got some rate limit that says, oh, you have to stop at 14 of these things or stop at 200 or whatever. So this is often a restartable process and it’s usually a simple process. It’s usually a relatively small amount of code, where you, you know, have a queue of things you want to retrieve. You go to the next one and then you store in database, next one store in database. And when you start the process up you start filling the database up with stuff and then if it blows up and you restart it, the first thing it does is it reads the databases. Oh, I don’t need any of those. And then it starts to get the next one and the next and the next one, next one and that is how you make this restartable. And databases are really good at having it so that your program that’s writing to the database can blow up and you don’t corrupt your data. You don’t have partial data. It’s either written or it’s not written. And so these things can blow up. Sometimes you just blow them up because you wanna blow them up and start them up and so they kind of, you start ‘em up again they scan down say, where was I? Oh, I’ll start here, here, here, here. And so this is often a slow and restartable process. And it also might be limited but for some reason. And so this runs for a while. And the third thing we’ll do in this chapter, it might run for days actually. And then, you have your data and then you start doing stuff inside your computer where you don’t really care so much about the the network. Maybe this is raw data that came in off the APIs and you want to make the data in some new little format. So you might go from one database to another database or a database to a file and kind of produce data that’s really ready for visualization. If this, this might be a little complex or there might be flaws in it. You might write scanners that go like, oh, wait a sec, this is inconsistent; sometimes it looks like this and sometimes it looks like that. So I’ll clean that stuff up. And then using some visualization or doing some Python programs that loop through the data once it’s cleaned up and then do some kind of summing or adding or who knows what they are that they’re doing but analyzing or visualizing. And what we’re going to use is we’re going to use things like Google Maps to do our visualization, a lot of Java Script and a thing called d3.js which is a Java script library. Now in this class, we’re not teaching you Java Script, not gonna teach Google Maps. I provided all these things so that when you run these programs that stuff is all there, but if you want to learn and see some examples of how to make a little simple java script visualization with a line or a word cloud or a map, we’ve got it and you can take a look at those things. Now, this is one form of data mining and it’s really kind of a data mining for an individual where you’re, you know, you’re pulling this data, you’re getting it local and then you’re working with it. There are other much more sophisticated data mining technologies that you might find yourself using but often what you’ll also find is Python is part of these or Python helps you manage these or you write a Python program to scan through these things or to prepare them or to do something. And so there’s lots of different data mining technologies. This is just one oversimplified, very Python oriented, data mining technology. I’d call this sort of personal data mining and you should take classes if you really want to become a data mining expert. This is just giving you some of the skills that we’ve learned in this class and solving some data mining. So, the first application that we’re going to data mine is an extension of an application we played with back in the Json chapter and the idea is that it has a queue of locations. These are not pretty locations meaning they’re user typed in locations. They’re actually from data from many years ago. It’s anonymized data from the students who actually took one of my very first moocs, Mooc on Internet History. But it’s kind of reduced and anonymized just to play with it but it’s not accurate. We don’t have GPS coordinates but if we use the Google Geodata API with Json we can do this. But we need to avoid rate limiting, so we’re going to cache this in a database meaning we’re only going to retrieve data at once and then we’re going to use the Google Maps API to visualize this in a browser. And the sample code is right there and that sample code geodata.zip has a read me and it tells exactly what to do to run this. And, and it shouldn’t be very hard for you to run it and produce a nice visual result. Here’s a basic process diagram of what’s going to happen. There is a list of the things to retrieve called where.data. It’s just a list of the, of locations. But these are, you know, not correct. They don’t have GPS. They’re just a, as typed into a text field by a user. And geoload is going to start and start reading this and it checks to see if it’s already in the database. This is a restartable process, as I mentioned. And, then it looks to see the first unretrieved data and then it goes out and does a web service, parses that then puts it into the database and then goes to the next one, parses that, puts in the database and this runs for a while and then maybe it blows up. Then you fix whatever or you start your computer back up and runs for a while. So this is a restartable process that in effect is adding stuff to this database. It’s an SQLite database and you can use the SQLite browser to look at this if you like, stuff we did in the database chapter. So you can run that, you can see what you got, run it some more, see what you got, de-bug it by using the SQLite browser. And then at some point you’ve got all of your data and you want to, you get, you’ve got a couple of things. We’ve got this application called G.O.dumped.py that sort of reads through all of this data and then prints some information out, nice summary information. It’s real common to want to do this, to get some summary information just for sanity checking. So you don’t have to use SQLite browser but this also writes out a little java script file called where.js, which then combined with where.html and the, and the, and the Google APIs, this uses a Java Script to print all these little pins on based on whatever data is in this database. And so that’s our first kind of end to end spider process visualize. First thing. So up next we’re going to show how we can use this to build a very simple search engine and then run the page rank algorithm.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.