سرفصل های مهم
Worked Exercise- Lists
توضیح مختصر
As a matter of fact, we're going to read you our famous mailbox data, look for lines that begin with From space, and extract the third word. So if we take a look at our dataset, it found the line started with From space, it split it, and it printed out the third word. And so the first thing I like to do in this kind of a situation is find the line and make sure there's a print statement right before it.
- زمان مطالعه 0 دقیقه
- سطح متوسط
دانلود اپلیکیشن «زوم»
فایل ویدیویی
برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.
ترجمهی درس
متن انگلیسی درس
Hello and welcome to Python for Everybody. I’m Charles Severance and now we’re going to take a look at how we would write some code to do some parsing, read some data. As a matter of fact, we’re going to read you our famous mailbox data, look for lines that begin with From space, and extract the third word. As a matter of fact, we already have some of this code already written, we’re going to debug it. We’re going to look at code and we’re going to debug it. So here we go, here we have it. And so, it’s a pretty basic program. It opens a file, loops through the file, throws away the whitespace, splits it into words and checks to see if this zeroth word, the first word, is From and if it’s not we skip and read the next line. And otherwise, if we find the line that starts with From space, then we print the third word which is wds sub 2, okay? So this is what we’ve got and we carefully saved this file into the same folder that we’ve got, ex 08. And so let’s go ahead, cd desktop python for everybody, ex 08. And so, this is some files. We’ve got our day of the week python and our mbox-short, so that’s sitting there. Okay? And so let’s run this program. This is the program that we’ve got right here, python3 dow.py and it doesn’t work. And, by now you’ve seen a few tracebacks and there you go. So, when you look at a traceback you think to yourself, well I made a mistake. And you’ve gotten pretty good at looking at that line. So there you are. You’re like, this is the line. There must be something wrong on this line. And you want to change it but that line’s not actually the problem in this particular thing and so you’ve got to be careful sometimes. And one of the things that you didn’t notice in this one right away is that it actually worked. It printed the first line out. So if we take a look at our dataset, it found the line started with From space, it split it, and it printed out the third word. And it blew up later. And so part of the problem is we don’t know what it was doing when it blew up. And so the first thing I like to do in this kind of a situation is find the line and make sure there’s a print statement right before it. And some I’m going to print words:, and then ,wds. I want to print right before the line that blows up. So that I know really when this finally does blow up what was going on in that line. So I’m going to run it again. And oop, did I forget to save it? No, I forgot to save it. Look at that. See the little blue dot? Forgot to save it. And so now we see a whole bunch of output. And we see that it’s actually doing a whole of work before it’s blowing up. And so you see that it prints the words out from that first line and prints out Saturday, which is exactly what we expect. It’s the third word in the line. And then it reads a whole bunch of stuff. And what it’s doing now is ignoring. Let me just put something here. I’m going to say print(‘Ignore’), So I can keep track of when these lines are being ignored. So let’s run it again and have the word Ignore pop up, right? And so it’s doing a lot of ignoring. It prints and finds these words, prints out Saturday, reads this line and ignores it, reads this line and ignores it, reads this line and ignores it. So a lot of stuff’s going on here that you might not realize. And so we have to take a look at what the problem is. So it is now blowing up wds sub 0. And now we can scroll down and we can look at exactly what happened right before the traceback. So we really now know exactly what happened before the traceback. And the interesting thing is that there is an empty string, I mean, empty array. There’s an array with zero items. So I’m going to print the line out too, print line:. Now I haven’t changed my program at all. I’m just trying to figure out what’s going on here. So I’ll save that, and then I’m going to run it, and we’ve got a lot of stuff and it’s still working. It reads a line, it reads the line, splits it into words, and then prints out Saturday, which is the third word on the line. Now, here it reads a line and this line is a blank line. And because it’s a blank line, the split returns no words, and that’s what blows up. And the problem now is oh, wait a sec, list index out of range. So wds sub 0 is not valid, which is the first word, when there are no words. So this is a statement that works most of time. Now, you might think oh, I want to just put a try and except in there. Well, the right thing to do is to say to yourself oh, wait a second. If I don’t have enough words, if the length of the words is less than 1, continue. So basically it’s going to come through here, it’s going to split it and if we don’t have any words, meaning it’s a blank line, then we’re going to skip it. So let’s run that. So now this ran all the way to the end, it did a lot of stuff, and it did not blow up. Specifically, it didn’t have a traceback. Another way to protect this would be to, let me take this part out. This is called a guardian pattern. Right? Guardian pattern, because this, this is dangerous. This could blow up but this, it won’t blow up if it makes it past here and then it won’t come through there under the conditions that are causing it to blow up. Another way to do this might be to protect it as follows, to say, oh, wait a sec, if the line is a blank line, no, continue. So now what we’re going to do is we’re going to skip blank lines. I even say this, print(‘Skip Blank’). So if it’s a blank, we’re going to skip blank and keep going. This will skip blank lines. It will come through here and this will skip lines that don’t have From but because we’re not processing blank lines, wds sub 0 always works. So I can run this code and it works again. So, here we have a blank line, we skipped it. Here we have a blank line, we skipped it. Now here, we had a non-blank line. So we parsed it but then we ignored it. And then up here, we’ll find a From somewhere. Let’s find a From, here it comes. Nope. Ignore, ignore. I’ve got too much debug print, I can’t find it. Here, I’ll just hunt for From with find. Okay, so there we go. It’s From and we print the thing out. So we’re getting a lot of extra stuff. So I’m going to comment out some of these debugs. And I’m actually just going to get rid of this whole skipping of the blank line. I’m going to do it with the wds. I’m going to go back to the guardian we had before. If the number of words that we got, len(wds), is less than 1, Continue. Okay? So now this going to be a working program. Oops, I’ve got to take another print statement out. Got to take another print statement out. We sort of know what we’re doing here. Okay, so this looks like a pretty safe thing. This guardian is protecting this dangerous, I’ll get rid of that one too. This is the word that could, was our traceback, and nothing else in this thing changed from when we started except we’ve added this little guardian. Now, the interesting thing is if it comes through here, and prints wds sub 2, what happens if somehow we find the line that has From as its first word and there’s only one word on, this is going to blow up. So we can make our guardian a little stronger. And we can say, you know what, we’re going to skip this line if it doesn’t have three words in it. So it has to have at least three words. And if we see less than three words, we’re going to skip it and just makes the guardian a bit stronger. And so the program works safely. And you see these things where you sometimes you want to check to see reasonable that your assumptions about data are reasonable and skip things where the data is not reasonable. Now there, so that’s one guardian pattern. Let me show you a slightly different way to do this and this is with an or statement. So I’m going to take this code, copy that, and put it here with or. Get rid of all this stuff. Well this is the guardian in a compound statement. So, what we’re saying is if there are less than three words on the line, or if the first word is not From, continue. Now, we’re doing this in order, because the way it works is or is true if either that’s true or this is true. But if it knows that this is true, then it doesn’t bother checking this, and the checking of this is what blows up, what causes the traceback. So if we flip this order, it would fail. If we do did in this order, it will work. So, let’s do this one right, it works. But if I get this backwards, it’s going to check this before it checks this. And were going to go back to failing again. So you’ve got to get the order of these things right. The guardian comes before in the or. The guardian comes before. And if this is true, than it doesn’t check this. This is called short circuit evaluation where it knows that as long as this part’s true, it doesn’t evaluate this second part. And so, now we have a guardian in the compound statement. You’ll see this a lot. Sometimes if it’s more complex, you do it in multiple statements or you say your fall-through. Check for sanity, check for sanity, and only run the code. So, I hope that that was useful to you. Looking a little bit about how to debug where you just don’t start chopping on the line that had the problem. It’s not always that line because we never did change that line. Although,we did change it a little bit at the end, we added this guardian here, but we also fixed it without it. Sometimes you add some print statements to figure out what’s going on before you just start chopping on that line. So again, I hope this helps, thanks.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.