Programs that Surf the Web (Chapter 12)

دوره Using Python to Access Web Data ، فصل 4 : Programs that Surf the Web (Chapter 12)

درباره‌ی این فصل:

In this section we learn to use Python to retrieve data from web sites and APIs over the Internet.

این شامل 8 زیر است:

That you can say hey, what is the actual value for the letter H. And it's called ord, which stands for ordinal. So this was like, UTF-8 rocks, and that's really because, as soon as these ideas came out, it was really clear that UTF-8 is the best practice for encoding data moving between systems. And the nice thing is, is you can tell it what character set it is but by default it assumes UTF-8 or ASCII dynamically.

So that's I think really and truly amazingly beautiful and simple, to take this whole internet, knowledge, architecture and HTTP and all that stuff, and roll it into one import statement and three lines of code. And if we want to write another loop where we actually read the stuff, and we go look for href equals quote and then pull this out, perhaps with a regular expression on a split statement or some kind of find operation, because we're good at strings by now. That's a really tiny light version of what it is that Google is doing when it's trying to make a full copy of the entire Internet on its own servers.

And so we're not using this lower level read and write code, we're just using a for loop. And the line does comeback as an array of bytes, so we have to do a decode. It really works, urllib makes URLs function inside Python very much like files.

That's fine if you can figure that out but sometimes you're running on a campus computer and you can't actually reinstall software on that, maybe you're bringing your python programs on a USB stick or on a shared drive or something. And what happens is, in this soup variable, it's somehow taken all the nasty bits that come from the web page and cleaned them up, and made a little pretty tree of things. So that is a simple use of the BeautifulSoup library to retrieve and parse HTML and pull out anchor tags, which is really sort of the beginning of a browser.

It basically, someone has just went through and figured all the bad things that could possibly happen when you're reading and parsing HTML. And it's not that that website had a bad URL, it has a certificate that's not in Python's official list. So that gives you a quick summary of using the BeautifulSoup library in Python along with the urllib.

Right, but so, in a lot of high school classes, you could incorporate, kind of programming skills, into the physics as well. I don't really mean to say I have twelve classes I'm going to require every single high school student to take just cuz they're in my field. We had so many people at Office Hours, that the management came and kicked us out of, The lobby and forced us to have this room where we had to buy hors d'oeuvres.

Last month I was off in Boston for the 4th annual World Wide Web conference, and I had the opportunity to talk to the inventor of the World Wide Web, Tim Berners-Lee. It's hard to imagine a greater revolution than what we've seen, I mean I go to the smallest places and everybody's putting up a web page. Right, the Web revolution could only take place because of the Internet, which had been a quieter, smaller revolution, but that had been, the Internet itself quietly being deployed throughout the world had happened, because the Internet was something people could assume, the Web revolution could take place.

You guys are all gonna be on YouTube soon, thanks to Chuck here. So that Mr. Blues Man and your Mojo Song. Dr. Chuck, Dr. Chuck on the camera.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.