13.2 eXtensible Markup Language (XML)

دوره: Using Python to Access Web Data / فصل: Web Services and XML (Chapter 13) / درس 2

13.2 eXtensible Markup Language (XML)

توضیح مختصر

XML doesn't really care what extra spaces you put in, but it certainly helps us as human beings to understand what's going on. We can pull these things up, we tend to indent them like any kind of programming environment to help our own reading. Serialization and deserialization is the act of taking from an internal structure in one programming environment, sending it across the network.

  • زمان مطالعه 8 دقیقه
  • سطح متوسط

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

So “XML” is what we call the Extensible Markup Language. And basically XML, any serialization format has some special characters and then some rules about how to form the serialized document, basically, from the internal structures. And so the rules of XML and part of the reason that XML become popular is, it became popular at about the time that HTML became popular. Or you could almost say that XML influenced HTML. But the notion of less thans and greater thans as the active characters, as the way to tag or otherwise mark the information. And so, that’s how it works, and so it works just like HTML, and there’s start tags and end tags. And that sort of brackets a chunk of stuff, and so people is a tag. And then person, and person, there that’s another tag, an ending tag. And then name and /name, that’s an ending tag. And the way to think about this is there is a simple element. These are called elements or nodes, and we’ll have a couple of ways visualizing these coming up. There is one kind of element that just has some text in between. And so that is the simplest bit, it’s called the simple element. And then another kind of element like this person actually just has, sort of, child nodes associated with it. And so the simple elements are these, and then the complex elements are person and people. And so they’re nested together. And the indenting is just something I’m showing you to make it look pretty. XML doesn’t really care what extra spaces you put in, but it certainly helps us as human beings to understand what’s going on. And so the primary purpose of XML is to share structured data. It was a simplified subset of this SGML, which kind of was the precursor to both XML and HTML. SGML was a little hard, so you could almost think of XML as like a simplified easy version of SGML. And so here are sort of the basics of it. It has a start tag and an end tag. So that’s what the start tag and end tag are. Start tag, end tag, start tag, end tag, start tag, end tag. Now, so that’s what start tags and end tags, the end tags are the ones that have the slash. There is textual content, that’s there’s the stuff between the tags. So there’s just the text, what you call the text nodes. And then, there are attributes, and attributes are always on the start tags. So the phone and the email, and they are key-value pairs using double quotes, as the type=”intl”. And the key thing about XML versus HTML is we get to make up the tags. In HTML we say things like h1 or a for the anchor tag, or h1 for header level one. Here, based on how we are going to agree between the two applications exchanging data, you can say the tag is person and /person. You still have to follow the rules, though, there is a slash tag at the end. So attributes. And then you can also have a self-closing tag. And that is, you just include this /> at the end of the open tag. And it’s as if you have a closing tag of the same name with an empty text area. And so that’s what this is saying, so that’s a self-closing tag. Okay, like I said the whitespace doesn’t really matter so much. We can pull these things up, we tend to indent them like any kind of programming environment to help our own reading. But, in general, whitespace, except in the middle of things like these text areas, the whitespace does matter. But, sort of, between here and here, the whitespace doesn’t matter. Between here and here, the whitespace doesn’t matter. So the whitespace, these extra spaces that I used to indent it, it doesn’t really matter. The only time it matters is in between, when you’re in a text area. Here’s some sample XML, just to give you some ideas. Here we have, there’s always one big outer tag of XML, you have the start tag and an end tag. And there’s only one of those, because I can’t be, sort of, multiply defined. There’s always the outer tag. We see a series of attributes. Right? So the attributes are key equals and then double=quoted string, key equals double=quoted string. And if you look at the HTML, you’ll see that this is exactly the same as HTML. The difference is in HTML you’re supposed to have a thing called href= “blah, blah, blah”, right? And so HTML is kind of like XML that’s more highly specified. Whereas this is just two programs agree on a format and they use it. So we have an outer tag that’s a complex element. And then we have, like in that title, you can have sort of things that are in order, like this ingredient. You see some attributes on there, and then you have a text block in the middle of here. So we’ll see in a second how these things all work. And then a sub tag. It’s like a tree, we’ll see that in a second, and a series of steps. And so these can be in order, they can be more than one of these things. And we can create all kinds of structures that are really designed based on the needs of our working two applications that are trying to cooperate. So tags are the beginning and the end. Attributes are key-value pairs on the start tag. Serialization and deserialization is the act of taking from an internal structure in one programming environment, sending it across the network. Deserialization is receiving across the network and translating it back into an internal structure on the destination computer. So there’s a couple different ways that we can look at this. The most common one, and the word nodes kind of comes from trees. Each of these is like a node, because it sort of comes together. It’s a place of connection, so we call this thing a node. And so you can think of the outer document as the top node of the tree. It’s kind of an upside-down tree, actually. If this was a tree, it’d have a trunk. And then we just have stuff like this and a squirrel sitting up here, right? But it’s kind of an upside-down tree. And so we have the top of the tree here and then it has two child nodes, the b node and the c node are directly beneath the a node. And then we model, as you’ll see in a bit why, we model the text in between them as a child. So it’s a child of the b node, so that’s the text is a text node, and it is a child of the b node. So the b node is all of this, and then this is the child of the b node. And the c node has two children, d and e. And the d has as child with capital Y and e has a child of capital Y. So, these are the simple and complex elements. And then there’s the text within the elements. But, like I said, we model the text as a child of the node, as you will soon see, right now. And that’s because we model attributes as different children. So, if we change this a little bit, and we make this have an attribute, w = “5”. all of this is part of the b node. And the b has the text area, there’s only one of those, and there could be many attributes right? There could be lots of attributes. I just have w=”5”, and so one of the children of the b node is the attribute child or the text child. And there can be many of these attribute childs, because there could be lots of attributes. You know a=”4”, b=”19”, they’ve always got to have double quotes. And so you could have many these attributes and they’re sort of children. But if you grab the node, and you’ll see when we start talking about doing this in programming languages. We’ll see why it’s important to kind of understand what it means when you grab this versus when you grab that. Those aren’t the same thing. X is the text at the node b, and the node b is that text, and attributes, and everything all rolled up together. Another way to think about these and a way to actually parse XML is through what we call paths. In a sense what you do is you just draw the tree and then you walk down the tree. And so this X could be thought of as a piece of data that’s at /a/b. So you start at the top. Go down to a, go down to b, what do you find there? If we go down here and go from a to c to d, and find Y, that’s this one, /a/c/d. So this is like a path. And you can think of this like folders on your computer. The a folder, then there’s two folders within a which are b and c, the children folders. And then within c there is children folders d and e. And so like this one here is /a/c/e and then say, what’s living there at /a/c/e? The text that’s living at /a/c/e is Z. And so that’s another way to think about XML. Now the thing we’re going to talk about next is an important aspect where we’re trying to decide between two applications. If I”m producing data and you’re consuming data, and you blow up, was it the data’s fault, or was it your fault? And that’s what XML Schema does for us.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.