Table of contents Intro to Image Recognition
This is why we must expose a model to as many different kinds of inputs as possible so that it learns to recognize general patterns rather than specific ones
Download 39.44 Kb.
|
article image recog
This is why we must expose a model to as many different kinds of inputs as possible so that it learns to recognize general patterns rather than specific ones. There are tools that can help us with this and we will introduce them in the next topic.
Hopefully by now you understand how image recognition models identify images and some of the challenges we face when trying to teach these models. Models can only look for features that we teach them to and choose between categories that we program into them. To machines, images are just arrays of pixel values and the job of a model is to recognize patterns that it sees across many instances of similar images and associate them with specific outputs. We need to teach machines to look at images more abstractly rather than looking at the specifics to produce good results across a wide domain. Next up we will learn some ways that machines help to overcome this challenge to better recognize images. In the meantime, though, consider browsing our article on just what sort of job opportunities await you should you pursue these exciting Python topics!
Transcript What is up, guys? Welcome to the first tutorial in our image recognition course. This is also the very first topic, and is just going to provide a general intro into image recognition. Now we’re going to cover two topics specifically here. One will be, “What is image recognition?” and the other will be, “What tools can help us to solve image recognition?” The first part, which will be this video, will be all about introducing the problem of image recognition, talk about how we solve the problem of image recognition in our day-to-day lives, and then we’ll go onto explore this from a machine’s point of view. After that, we’ll talk about the tools specifically that machines use to help with image recognition. Specifically, we’ll be looking at convolutional neural networks, but a bit more on that later. Let’s get started with, “What is image recognition?” Image recognition is seeing an object or an image of that object and knowing exactly what it is. At the very least, even if we don’t know exactly what it is, we should have a general sense for what it is based on similar items that we’ve seen. Essentially, we class everything that we see into certain categories based on a set of attributes. That’s why image recognition is often called image classification, because it’s essentially grouping everything that we see into some sort of a category. Now the attributes that we use to classify images is entirely up to us. For example, if we’re looking at different animals, we might use a different set of attributes versus if we’re looking at buildings or let’s say cars, for example. If we’re looking at vehicles, we might be taking a look at the shape of the vehicle, the number of windows, the number of wheels, et cetera. If we’re looking at animals, we might take into consideration the fur or the skin type, the number of legs, the general head structure, and stuff like that. It’s entirely up to us which attributes we choose to classify items. And this could be real-world items as well, not necessarily just images. Now, this allows us to categorize something that we haven’t even seen before. In fact, this is very powerful. We can take a look at something that we’ve literally never seen in our lives, and accurately place it in some sort of a category. We can often see this with animals. I highly doubt that everyone has seen every single type of animal there is to see out there. No doubt there are some animals that you’ve never seen before in your lives. But, you should, by looking at it, be able to place it into some sort of category. You should know that it’s an animal. You should have a general sense for whether it’s a carnivore, omnivore, herbivore, and so on and so forth. Now, another example of this is models of cars. Now, every single year, there are brand-new models of cars coming out, some which we’ve never seen before. Some look so different from what we’ve seen before, but we recognize that they are all cars. We can take a look again at the wheels of the car, the hood, the windshield, the number of seats, et cetera, and just get a general sense that we are looking at some sort of a vehicle, even if it’s not like a sedan, or a truck, or something like that. Now, how does this work for us? Well, a lot of the time, image recognition actually happens subconsciously. In fact, we rarely think about how we know what something is just by looking at it. We just kinda take a look at it, and we know instantly kind of what it is. And a big part of this is the fact that we don’t necessarily acknowledge everything that is around us. If we do need to notice something, then we can usually pick it out and define and describe it. Take, for example, if you’re walking down the street, especially if you’re walking a route that you’ve walked many times. It’s highly likely that you don’t pay attention to everything around you. Maybe there’s stores on either side of you, and you might not even really think about what the stores look like, or what’s in those stores. However, when you go to cross the street, you become acutely aware of the other people around you, of the cars around you, because those are things that you need to notice. In fact, even if it’s a street that we’ve never seen before, with cars and people that we’ve never seen before, we should have a general sense for what to do. The light turns green, we go, if there’s a car driving in front of us, probably shouldn’t walk into it, and so on and so forth. Now, this kind of process of knowing what something is is typically based on previous experiences. If we’d never come into contact with cars, or people, or streets, we probably wouldn’t know what to do. However, we’ve definitely interacted with streets and cars and people, so we know the general procedure. So, go on a green light, stop on a red light, so on and so forth, and that’s because that’s stuff that we’ve seen in the past. Even if we haven’t seen that exact version of it, we kind of know what it is because we’ve seen something similar before. Now, sometimes this is done through pure memorization. Maybe we look at a specific object, or a specific image, over and over again, and we know to associate that with an answer. This is just kind of rote memorization. However, the more powerful ability is being able to deduce what an item is based on some similar characteristics when we’ve never seen that item before. And that’s really the challenge. It’s easy enough to program in exactly what the answer is given some kind of input into a machine. You could just use like a map or a dictionary for something like that. However, the challenge is in feeding it similar images, and then having it look at other images that it’s never seen before, and be able to accurately predict what that image is. Now, this kind of a problem is actually two-fold. The problem is first deducing that there are multiple objects in your field of vision, and the second is then recognizing each individual object. So, step number one, how are we going to actually recognize that there are different objects around us? Typically, we do this based on borders that are defined primarily by differences in color. This makes sense. If we’ve seen something that camouflages into something else, probably the colors are very similar, so it’s just hard to tell them apart, it’s hard to place a border on one specific item. However, if you see, say, a skyscraper outlined against the sky, there’s usually a difference in color. It’s very easy to see the skyscraper, maybe, let’s say, brown, or black, or gray, and then the sky is blue. So there’s that sharp contrast in color, therefore we can say, ‘Okay, there’s obviously something in front of the sky.’ Now, again, another example is it’s easy to see a green leaf on a brown tree, but let’s say we see a black cat against a black wall. We might not even be able to tell it’s there at all, unless it opens its eyes, or maybe even moves. Now, we don’t necessarily need to look at every single part of an image to know what some part of it is. Take, for example, if you have an image of a landscape, okay, so there’s maybe some trees in the background, there’s a house, there’s a farm, or something like that, and someone asks you to point out the house. Well, you don’t even need to look at the entire image, it’s just as soon as you see the bit with the house, you know that there’s a house there, and then you can point it out. This is even more powerful when we don’t even get to see the entire image of an object, but we still know what it is. Take, for example, an image of a face. Let’s say we’re only seeing a part of a face. Specifically, we only see, let’s say, one eye and one ear. But we still know that we’re looking at a person’s face based on the color, the shape, the spacing of the eye and the ear, and just the general knowledge that a face, or at least a part of a face, looks kind of like that. Our brain fills in the rest of the gap, and says, ‘Well, we’ve seen faces, a part of a face is contained within this image, therefore we know that we’re looking at a face.’ That’s, again, a lot more difficult to program into a machine because it may have only seen images of full faces before, and so it gets a part of a face, and it doesn’t know what to do. No longer are we looking at two eyes, two ears, the mouth, et cetera. We’re only looking at a little bit of that. Now, before we talk about how machines process this, I’m just going to kind of summarize this section, we’ll end it, and then we’ll cover the machine part in a separate video, because I do wanna keep things a bit shorter, there’s a lot to process here. So some of the key takeaways are the fact that a lot of this kinda image recognition classification happens subconsciously. We just look at an image of something, and we know immediately what it is, or kind of what to look out for in that image. Obviously this gets a bit more complicated when there’s a lot going on in an image. Also, image recognition, the problem of it is kinda two-fold. The first is recognizing where one object ends and another begins, so kinda separating out the object in an image, and then the second part is actually recognizing the individual pieces of an image, putting them together, and recognizing the whole thing. Also, know that it’s very difficult for us to program in the ability to recognize a whole part of something based on just seeing a single part of it, but it’s something that we are naturally very good at. Okay, so, think about that stuff, stay tuned for the next section, which will kind of talk about how machines process images, and that’ll give us insight into how we’ll go about implementing the model. Okay, so thanks for watching, we’ll see you guys in the next one. What’s up guys? Welcome to the second tutorial in our image recognition course. Here we’re going to continue on with how image recognition works, but we’re going to explore it from a machine standpoint now. We just finished talking about how humans perform image recognition or classification, so we’ll compare and contrast this process in machines. For starters, contrary to popular belief, machines do not have infinite knowledge of what everything they see is. So, let’s say we’re building some kind of program that takes images or scans its surroundings. Well, it’s going to take in all that information, and it may store it and analyze it, but it doesn’t necessarily know what everything it sees it. It might not necessarily be able to pick out every object. Machines only have knowledge of the categories that we have programmed into them and taught them to recognize. And, actually, this goes beyond just image recognition, machines, as of right now at least, can only do what they’re programmed to do. So this means, if we’re teaching a machine learning image recognition model, to recognize one of 10 categories, it’s never going to recognize anything else, outside of those 10 categories. Now, a simple example of this, is creating some kind of a facial recognition model, and its only job is to recognize images of faces and say, “Yes, this image contains a face,” or, “no, it doesn’t.” So basically, it classifies everything it sees into a face or not a face. Now, this means that even the most sophisticated image recognition models, the best face recognition models will not recognize everything in that image. It’s never going to take a look at an image of a face, or it may be not a face, and say, “Oh, that’s actually an airplane,” or, “that’s a car,” or, “that’s a boat or a tree.” It’s just going to say, “No, that’s not a face,” okay? Because that’s all it’s been taught to do. It’s classifying everything into one of those two possible categories, okay? So even if something doesn’t belong to one of those categories, it will try its best to fit it into one of the categories that it’s been trained to do. So, essentially, it’s really being trained to only look for certain objects and anything else, just, it tries to shoehorn into one of those categories, okay? So that’s a very important takeaway, is that if we want a model to recognize something, we have to program it to recognize that, okay? Otherwise, it may classify something into some other category or just ignore it completely. Now, to a machine, we have to remember that an image, just like any other data, is simply an array of bytes. So it’s really just an array of data. It doesn’t look at an incoming image and say, “Oh, that’s a two,” or “that’s an airplane,” or, “that’s a face.” It’s just an array of values. Even images – which are technically matrices, there they have columns and rows, they are essentially rows of pixels, these are actually flattened out when a model processes these images. Generally speaking, we flatten it all into one long array of bytes. So, I say bytes because typically the values are between zero and 255, okay? So that’s a byte range, but, technically, if we’re looking at color images, each of the pixels actually contains additional information about red, green, and blue color values. Lucky for us, we’re only really going to be working with black and white images, so this problem isn’t quite as much of a problem. But realistically, if we’re building an image recognition model that’s to be used out in the world, it does need to recognize color, so the problem becomes four times as difficult. Now, if an image is just black or white, typically, the value is simply a darkness value. I guess this actually should be a whiteness value because 255, which is the highest value as a white, and zero is black. And, that means anything in between is some shade of gray, so the closer to zero, the lower the value, the closer it is to black. And, the higher the value, closer to 255, the more white the pixel is. Now, this is the same for red, green, and blue color values, as well. If we get a 255 in a red value, that means it’s going to be as red as it can be. If we get 255 in a blue value, that means it’s gonna be as blue as it can be. But, of course, there are combinations. So, for example, if we get 255 red, 255 blue, and zero green, we’re probably gonna have purple because it’s a lot of red, a lot of blue, and that makes purple, okay? So this is kind of how we’re going to get these various color values encoded into our images. Now, machines don’t really care about seeing an image as a whole, it’s a lot of data to process as a whole anyway, so actually, what ends up happening is these image recognition models often make these images more abstract and smaller, but we’ll get more into that later. To process an image, they simply look at the values of each of the bytes and then look for patterns in them, okay? So if we feed an image of a two into a model, it’s not going to say, “Oh, well, okay, I can see a two.” It’s just gonna see all of the pixel value patterns and say, “Oh, I’ve seen those before “and I’ve associated with it, associated those with a two. “So we’ll probably do the same this time,” okay? So they’re essentially just looking for patterns of similar pixel values and associating them with similar patterns they’ve seen before. In this way, image recognition models look for groups of similar byte values across images so that they can place an image in a specific category. Again, coming back to the concept of recognizing a two, because we’ll actually be dealing with digit recognition, so zero through nine, we essentially will teach the model to say, “‘Kay, we’ve seen this similar pattern in twos. “We’ve seen this pattern in ones,” et cetera. So when it sees a similar patterns, it says, “Okay, well, we’ve seen those patterns “and associated it with a specific category before, “so we’ll do the same.” Now, I should say actually, on this topic of categorization, it’s very, very rarely going to be the case that the model is 100% certain an image belongs to any category, okay? That’s why these outputs are very often expressed as percentages. So it might be, let’s say, 98% certain an image is a one, but it also might be, you know, 1% certain it’s a seven, maybe .5% certain it’s something else, and so on, and so forth. So it’s very, very rarely 100% it will, you know, we can get very close to 100% certainty, but we usually just pick the higher percent and go with that. Now, an example of a color image would be, let’s say, a high green and high brown values in adjacent bytes, may suggest an image contains a tree, okay? Now, if many images all have similar groupings of green and brown values, the model may think they all contain trees. So it will learn to associate a bunch of green and a bunch of brown together with a tree, okay? So this is maybe an image recognition model that recognizes trees or some kind of, just everyday objects. Now, the unfortunate thing is that can be potentially misleading. There are plenty of green and brown things that are not necessarily trees, for example, what if someone is wearing a camouflage tee shirt, or camouflage pants? Well, that’s definitely not a tree, that’s a person, but that’s kind of the point of wearing camouflage is to fool things or people into thinking that they are something else, in this case, a tree, okay? So really, the key takeaway here is that machines will learn to associate patterns of pixels, rather than an individual pixel value, with certain categories that we have taught it to recognize, okay? Now, we can see a nice example of that in this picture here. So, there’s a lot going on in this image, even though it may look fairly boring to us. There’s the decoration on the wall. There’s the lamp, the chair, the TV, the couple of different tables. There’s a vase full of flowers. There’s a picture on the wall and there’s obviously the girl in front. And, the girl seems to be the focus of this particular image. Now, we are kind of focusing around the girl’s head, but there’s also, a bit of the background in there, there’s also, you got to think about her hair, contrasted with her skin. There’s also a bit of the image, that kind of picture on the wall, and so on, and so forth. So there may be a little bit of confusion. It’s not 100% girl and it’s not 100% anything else. And, that’s why, if you look at the end result, the machine learning model, this is 94% certain that it contains a girl, okay? It’s, for a reason, 2% certain it’s the bouquet or the clock, even though those aren’t directly in the little square that we’re looking at, and there’s a 1% chance it’s a sofa. Now, I know these don’t add up to 100%, it’s actually 101%. But, you’ve got to take into account some kind of rounding up. Also, this definitely demonstrates how a bigger image is broken down into many, many smaller images and ultimately is categorized into one of these categories. So, in this case, we’re maybe trying to categorize everything in this image into one of four possible categories, either it’s a sofa, clock, bouquet, or a girl. And, in this case, what we’re looking at, it’s quite certain it’s a girl, and only a lesser bit certain it belongs to the other categories, okay? So again, remember that image classification is really image categorization. Machines can only categorize things into a certain subset of categories that we have programmed it to recognize, and it recognizes images based on patterns in pixel values, rather than focusing on any individual pixel, ‘kay? So when we come back, we’ll talk about some of the tools that will help us with image recognition, so stay tuned for that. Download 39.44 Kb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling