Worked Example- Page Rank - Visualization (Chapter 16)

دوره: Capstone- Retrieving, Processing, and Visualizing Data with Python / فصل: Building a Search Engine / درس 4

Worked Example- Page Rank - Visualization (Chapter 16)

توضیح مختصر

So it connect to our database, create a cursor and then just do a select count and we're going to just show the number of links. You'll tend to write little helpers like this that make your life easier just to show you the kinds of things that you want. And if I take a look at this file spider.js, you can see that it's some objects that basically put the page rank in, which ID it is, and that's a way for me to build a link back and forth.

  • زمان مطالعه 6 دقیقه
  • سطح متوسط

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

Hello, and welcome to Python for Everybody. We’re doing some code walk-throughs. If you want to follow along we’ve got a sample code zip that you can download and take a look at all the code. And while we are in the middle of is we’re in the middle of the page rank code and we just got done running the page rank. And so we have spidered the code, we’ve ran page rank a bunch of times, SP reset allows us to restart the page rank algorithm if we want but we’re not going to play with that. We’re just going to play with SPdump and SPjson and do the visualization. Which is the fun part. So I’ll go in to spdump. So this is simple code. because it’s really just running a SQL query and then turning stuff out, right? So it connect to our database, create a cursor and then just do a select count and we’re going to just show the number of links. We’re going to order by the number of inbound links descending. So we see the most linked things and we’ll see the top fifty of that. So it’s just a sample. You’ll tend to write little helpers like this that make your life easier just to show you the kinds of things that you want. Spdump.py. Like you just kind of test to make, this looks right to me. And so, here is the number of inbound links. That’s my blog that has the most inbound links followed by my uncategorized whatever that is. And these are the number impound links within my own blog somehow, I don’t know. Because, this is not looking at the whole Internet at all. So, there we go. So, that’s SP dump, pretty straight forward. And now, we’re going to go through the visualization process. And so this is going to look at all that data and produce a JavaScript file. It’s going to write a JavaScript file, that will then be fed into my visualization, using D3. And spJSON is going to do a big, long join. It joins the links with the thing, where HTML is not null, or error is not null. In order by the number of inbound links. So, we’re looking at the things that have the highest number of inbound links. We’re going to read all this stuff. We’re going to read through all those rows. And pull out the page rank for each one. We are looking for the highest and lowest rank because these numbers can vary quite widely. They go all the way from you know 0.000 to 20 or 30. And so we, it asked how many do you want to do? So it only does the top, like 20 or something and you’ll see why we need that in the visualization. And so this is just checking. And so we’re going to write out a file, we’ll see what the format of this is, it’s just a JavaScript file and we’re going to write out, we’re basically normalizing the rank. We’re subtracting the minimum rank and because we’re going to turn this into line weight the thickness of the line and so we’re dividing by the, we’re normalizing the rank to be the thickness of the line and the size of the ball, you’ll see all of this in. And so this is really just writing some JavaScript with little strings and stuff like that. And then we’re going to finish the JavaScript and then we’re going to write all the links out. So these are the balls that you’ll see and this is showing what this is drawing all the lines and this is again normalizing things for thickness. And printing these things out, now I don’t want to go through this in tremendous detail. But so I’ll do python spjson.py. Let’s do the top 20 nodes. And if I take a look at this file spider.js, you can see that it’s some objects that basically put the page rank in, which ID it is, and that’s a way for me to build a link back and forth. Weight is how big the little circle is and then I have the links. And I only asked for the top 20, right? And then this is the thickness of the line, where the line starts, where the line ends, okay? So this is red. By this HTML file. And it’s going to read somewhere this force JS file and my own spider.js code. This is some JavaScript I mean the force.js is the visualization code, and this is d3, the visualization library. So I’m using this d3.js, which is a really great visualization library. And this is just drawing the circles and making the circles colors, and making the circles bigger and smaller. And then connecting all the lines in between it, so this is just there. This data feeds that thing. And so when we’re all done, you simply say, open. You don’t have to do anything, open force.html. And so this, all of this beautiful JavaScript stuff is like, wow that’s really cool because you can move these things around. You can see the circles are bigger if you hover it for a while it shows you the big ones. You know you can see these things and it’s kind of cool. So I gave you all of this force.js and force.html and so that kind of visualizes the page rank and you could use this to visualize quite a bit of stuff. It’ll take you a while to pull down enough data from a real web site, but after you’ve pulled down 400-500 pages if you have some time and then the visualization is quite interesting. But you can see why we had to pull down several hundred pages just to get this much page rank information. Okay, so that gives you a sense of how to run the page rank code in Python for everybody. So thanks for listening.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.