13.3 - XML Schema

دوره: Using Python to Access Web Data / فصل: Web Services and XML (Chapter 13) / درس 3

13.3 - XML Schema

توضیح مختصر

We're not going to spend a lot of time with this, but it's an important concept to sort of imagine and understand how contracts between cooperating applications have to be developed. I'm mostly talking about the one that kind of won or the one you're most likely these days to encounter called the XML Schema from the World Wide Web Consortium. And so I'm not trying to teach you how to be an XSD wizard, just the notion that there is this syntax that's used to establish a contract so that you can resolve disagreements between cooperating applications.

  • زمان مطالعه 0 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

متن انگلیسی درس

So we have two cooperating applications, and they’ve got to send data to one another, and they have a disagreement as to whether or not the data is right. One side might blow up or the other side might blow up and it’s like whose fault is it? And so it’s important to be able to define a contract as to what is acceptable XML. That’s really kind of outside of either application. You can’t say, the XML that works is the XML. The correct XML is the one that causes my program not to blow up. That’s not really the best way to define it, so we have to define it kind of at the moment of data exchange, and that’s what XML schema does. It’s a way to sort of establish outside of any program, and then separately check. So this side can say I sent good stuff, and this side said I received good stuff or I received bad stuff. So you can just look at a document and you can say yes, this validates, or no, it doesn’t validate. And validation is not the act of transferring the data or even deserializing the data. Validation is the act of verifying that the data is in the right format. It’s a contract. So if you’re working with something like your airline reservation system that’s working with an airline system, I mean that’s working with a hotel system, you might say okay, and here’s the schema. And they just publish that separately and agree that’s the schema. So if later the XML starts to change in bad ways that break one side or the other, they can know which side it was that started changing. And so XML validation is the act of taking an document and a Schema Contract, which itself is also an XML document, and then sending to the Validator. We’re not going to spend a lot of time with this, but it’s an important concept to sort of imagine and understand how contracts between cooperating applications have to be developed. And so here’s a bit of XML. So some XML person tag, remember the outer is only a single tag, and then I’ve got three child tags. You know last name, age, date born, and then we have a contract. And so like I said the contract itself is XML with a couple of weird-looking tags. And what the XML is trying to say, you know, what kind of tag this is, how much data you’re supposed to put in it. Can you have any child tags underneath it? All those kinds of things are the questions that are being asked by the contract. And so this xs;complexType As we’ve said before, an element that can have children are called complex and the element that can’t are simple. And so this basically says the outer bit of this particular XML is expected to be a tag named person and that’s what that’s saying. Then what it says is within that there’s going to be a sequence of tags, xs:sequence. And then we basically say oh, and there’s going to be a tag called lastname and it’s going to be a string. And then there’s going to be a tag that’s age and it’s going to be integer. And there’s going to be a tag called dateborn and that’s going to be a date. And so we can sort of look at this and say outer one person, next one in name, age, that’s a number, that’s good, that’s a string, that’s good. That looks like a date, that’s good. Check, check, check, check, check. This XML matches that contract and that’s the idea. Now I’m not by the time you actually have to make one of these, you’ll go be, you’ll have to read all kinds of stuff and figure all this out. So I just want you to get the sense that these things exist and they’re not impossible to read. And so it turns out again, in the early 90s, there was a number of these schema languages that were out there and you’ll run across them. I’m mostly talking about the one that kind of won or the one you’re most likely these days to encounter called the XML Schema from the World Wide Web Consortium. It’s called XSD and usually in the file that you get, if I just have a file and I send you the XML I have a suffix of .xml, and if I send you a schema, I tend to send you a file that’s .xsd and so we kind of just call it XSD and that’s the one we’re going to talk about and so away we go. That’s the one. There is other. I just want you to know that there’s other ones out there. So like I said. You have complex elements, you have simple elements, you have a sequence. Those are sort of the basics of the tags that you put into XSD. You can do further things, I just gave you the simplest kind of stuff. And so, for example, I can basically say, okay, this tag, full string? I want there to be a thing called full name, there’s a sequence. So this is sequence, we can see that this is the sequence. And I want this to be minOccurs equals 1, maxOccurs equals 1. That basically means there’s going to be one and exactly one. One and exactly one. MinOccurs if you have less than one, it’s an error, if you have more than one, it’s an error. This one here, this tag. This one can happen 0 times up to 10. So that just means if there is somewhere between zero and ten of those tags between full name and the end of person, we are happy with this. And so that minOccurs and maxOccurs is basically how many times these things must appear. So again this is just a little more sophisticated XSD. We’ve already played with at couple of different kinds of data types. We’ve talked about string, we’ve talked about date. We’ll look at the date in a little more detail. Date, time, decimal, and integer. So it sort of understands the difference between a floating point and an integer. So you can render an opinion as to what you want in there. String, of course, is just about anything and dates are kind of special. The date format that they chose, of course if you go from different countries the date is all kinds of different things. The date format that they’ve chosen that’s heavily used on the Internet is a date format that’s year four-digit year, dash, two-digit month, dash, two-digit day. And that’s not how it is in America. We do 9/24/2002 in America. But in the web, they basically said how about we pick. We didn’t want to pick the one that was the most popular but pick something that computers would like. So this turns out by forcing it to be fixed like this, it sorts. So if you were to sort this, the year is the most significant. The month is second most significant. And the day is the next most significant. And you zero fill up to two digits for the month and two digits for the day. And so January 1, 2001 is 2001-01-01 and so it’s the same length as 2002-12-31. And so you can just sort these and it just sorts as strings. They sort quite naturally, whereas lots of the formats that we use for dates in common use or in our own writing or on checks or whatever, don’t sort so well. So we did this for computer folks, got a picture of this coming up in a second. And the other thing is if it’s a date-time, it’s exactly the same date format, and then it’s the letter T, and then hours, minutes, and seconds, and then a time zone. Dates and times in the web and on the Internet are very problematic. Because in the real world where the sun comes up and the sun comes down, and it matters what time of day, you kind of want it to be noon when it’s light out. We have time zones. And then we have savings time. And then certain places violate the rules, and they are a different time than somebody that’s right next door. And so a thing that computers tend to do, is they tend to ignore the time zone. I remember in the old days, computers would mess up when daylight savings time happened. And they don’t anymore, because they tend to think of all times inside the computer as Universal Time or Greenwich Mean Time. And that means it could even be a different day, but then they offset it. And if you’ve ever traveled and your calendar sort of switches, that’s because the time of your items is the same inside the calendar. But then when your local time switches, it just moves forward or back, six hours forward or three hours back or whatever. And so, that’s why we tend. Now, this time formats can have time zones in it, but it’s not highly recommended. and so we tend to see these Z times, which are the UTC, and like I said it is filled out 0’s so that all the columns have to be filled out. four-two-two with the dashes and that means they’ll all have the same length and the letter T and then two digit two digit two digits with the colons and then at the end the time zone. And like I said we tend to do everything in absolute time. Here are some more XSDs. Let’s take a look at this one. Some of the documents have this little xml that’s really an indication that it’s an XML document and then we have the schema which is the outer one. There’s an address as the outer tag, and that tag goes all the way to here because these are just key-value pairs. We have a recipient which is a string. Recipient, house, that’s a string I guess, street which is a string. And then we have post code, county. What have we got here? Town, County, oh County, that’s optional. minOccurs 0, so you’ll notice that there is no county here, because that’s minOccurs 0, there’s no county. And then if we take take a look at post code, post code is a string, so that’s a string. And then we have country, so country’s an interesting one, so this is all about country right here. So country is a string, but we’re going to restrict it with this enumeration, it says it has to be one of these five strings. So it can’t be just anything, it has to be one of these five strings. In this particular, we can validate that this UK is indeed one of those valid strings. So this validates. So this bit here validates everywhere we can check and validate that every one of the tags there meets the needs of this XML schema. Here’s another schema that has a couple of other things. xs:string, we’ve got that. string, we’ve dont that. maxOccurs=”unbounded” That says as many as you want. minOccurs=”0”, we know how to deal with that. xs:positiveInteger, that just means your -14 is not allowed. And then use=”required”, it says this has to be there. And you can talk about attributes as well. So you basically say you must have an orderid attribute on this particular tag. Okay? And so I’m not trying to teach you how to be an XSD wizard, just the notion that there is this syntax that’s used to establish a contract so that you can resolve disagreements between cooperating applications. So up next, we’ll switch from schema back to XML and we’ll look at how to parse XML in Python.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.