دوره Capstone- Retrieving, Processing, and Visualizing Data with Python ، فصل 2 : Building a Search Engine
دربارهی این فصل:
This week we will download and run a simple version of the Google PageRank Algorithm and practice spidering some content. The assignment is peer-graded, and the first of three required assignments in the course. This a continuation of the material covered in Course 4 of the specialization, and is based on Chapter 16 of the textbook.
این شامل 6 زیر است:
The idea is that Google and other search engines, including the one that you're gonna run, don't actually want the web. And it's a simple website that tells that search engines when they see a domain or URL for the first time they download this and it informs them where to look and where not to look. And then there is some HTML and D-3.js which is a visualization that produces this pretty picture and the bigger little dots are the ones with the better page rank and you can grab this and move all this stuff around.
So I provide this bs4 zip as a quick and dirty way if you can't install something for all of the Python users on your system. And so this is just kind of nasty choppage and throwing away the URLs, that we're going through a page, and we have a bunch that we don't like, or we have to clean them up or whatever. But now, we finally, here at line 132, we're ready to put this into Pages, URL and the HTML, and it's all good, right?
And it's cruising around, and doing things, and the beauty of any of these spider processes is I can stop anytime, and just hit Ctrl+C. So the first thing I do is I read in all of the from_ids from the links, SELECT DISTINCT throws out any duplicates. And so we're going to have a dictionary that's based on the id, the primary key, that's what node is, equals the rank.
So it connect to our database, create a cursor and then just do a select count and we're going to just show the number of links. You'll tend to write little helpers like this that make your life easier just to show you the kinds of things that you want. And if I take a look at this file spider.js, you can see that it's some objects that basically put the page rank in, which ID it is, and that's a way for me to build a link back and forth.
We're here in Detroit Michigan on Woodward and I'd like to introduce to you to some of your fellow students, so here we go. Hi, my name is Enola Seegers and I'm a student with the Focus Hope STEM Bridge program over in Detroit Michigan on Oakman Boulevard. A college professor at Oakland University where I teach programming and stuff like that.
You know, I've been working in the general area of pattern recognition, image processing, computer vision for the last 40 years. And it just sort of happened, the serendipity, that in 1990 somebody called me from Washington, D.C. and said, you know, you do good image processing work, and the NSA has designed or funded the development of FPGA processor. And this notion is very important in the legal proceedings, especially when a person is convicted based on partial fingerprint found at the crime scene.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.