A Is Any Stimulus That Increases the Probability That a Preceding Behavior Will Occur Again.
Chapter 8. Learning
8.ii Changing Behaviour through Reinforcement and Penalization: Operant Conditioning
Learning Objectives
- Outline the principles of operant workout.
- Explain how learning can be shaped through the employ of reinforcement schedules and secondary reinforcers.
In classical conditioning the organism learns to acquaintance new stimuli with natural biological responses such as salivation or fear. The organism does non learn something new but rather begins to perform an existing behaviour in the presence of a new signal. Operant conditioning, on the other hand, is learning that occurs based on the consequences of behaviour and can involve the learning of new actions. Operant conditioning occurs when a dog rolls over on command because it has been praised for doing so in the past, when a schoolroom bully threatens his classmates because doing so allows him to go his way, and when a kid gets good grades because her parents threaten to punish her if she doesn't. In operant conditioning the organism learns from the consequences of its own actions.
How Reinforcement and Punishment Influence Behaviour: The Research of Thorndike and Skinner
Psychologist Edward L. Thorndike (1874-1949) was the first scientist to systematically study operant workout. In his research Thorndike (1898) observed cats who had been placed in a "puzzle box" from which they tried to escape ("Video Clip: Thorndike's Puzzle Box"). At first the cats scratched, bit, and swatted haphazardly, without whatever idea of how to get out. But eventually, and accidentally, they pressed the lever that opened the door and exited to their prize, a scrap of fish. The next time the cat was constrained inside the box, it attempted fewer of the ineffective responses before carrying out the successful escape, and later several trials the cat learned to virtually immediately make the correct response.
Observing these changes in the cats' behaviour led Thorndike to develop his law of effect, the principle that responses that create a typically pleasant effect in a particular situation are more likely to occur again in a similar situation, whereas responses that produce a typically unpleasant event are less likely to occur again in the state of affairs (Thorndike, 1911). The essence of the law of result is that successful responses, because they are pleasurable, are "stamped in" past feel and thus occur more oftentimes. Unsuccessful responses, which produce unpleasant experiences, are "stamped out" and subsequently occur less frequently.
When Thorndike placed his cats in a puzzle box, he found that they learned to engage in the important escape behaviour faster after each trial. Thorndike described the learning that follows reinforcement in terms of the law of effect.
Scout: "Thorndike'south Puzzle Box" [YouTube]: http://www.youtube.com/lookout?v=BDujDOLre-eight
The influential behavioural psychologist B. F. Skinner (1904-1990) expanded on Thorndike'due south ideas to develop a more than complete set of principles to explain operant conditioning. Skinner created specially designed environments known every bit operant chambers (unremarkably called Skinner boxes) to systematically written report learning. A Skinner box (operant chamber) is a construction that is big enough to fit a rodent or bird and that contains a bar or key that the organism tin printing or peck to release nutrient or h2o. It too contains a device to record the beast'south responses (Figure 8.5).
The most bones of Skinner's experiments was quite similar to Thorndike'south research with cats. A rat placed in the chamber reacted equally one might expect, scurrying well-nigh the box and sniffing and clawing at the flooring and walls. Eventually the rat chanced upon a lever, which it pressed to release pellets of food. The next time around, the rat took a little less time to press the lever, and on successive trials, the fourth dimension it took to press the lever became shorter and shorter. Soon the rat was pressing the lever as fast as it could swallow the nutrient that appeared. As predicted by the law of effect, the rat had learned to repeat the action that brought most the food and cease the actions that did not.
Skinner studied, in detail, how animals changed their behaviour through reinforcement and penalization, and he developed terms that explained the processes of operant learning (Tabular array 8.1, "How Positive and Negative Reinforcement and Penalty Influence Behaviour"). Skinner used the term reinforcerto refer to any event that strengthens or increases the likelihood of a behaviour, and the term punisher to refer to any event that weakens or decreases the likelihood of a behaviour. And he used the terms positive and negative to refer to whether a reinforcement was presented or removed, respectively. Thus, positive reinforcement strengthens a response past presenting something pleasant later the response, and negative reinforcement strengthens a response by reducing or removing something unpleasant. For example, giving a kid praise for completing his homework represents positive reinforcement, whereas taking Aspirin to reduce the pain of a headache represents negative reinforcement. In both cases, the reinforcement makes it more than likely that behaviour will occur over again in the time to come.
[Skip Tabular array] | |||
Operant conditioning term | Clarification | Outcome | Example |
---|---|---|---|
Positive reinforcement | Add together or increase a pleasant stimulus | Behaviour is strengthened | Giving a educatee a prize afterwards he or she gets an A on a test |
Negative reinforcement | Reduce or remove an unpleasant stimulus | Behaviour is strengthened | Taking painkillers that eliminate pain increases the likelihood that you will take painkillers once again |
Positive punishment | Present or add an unpleasant stimulus | Behaviour is weakened | Giving a educatee extra homework afterward he or she misbehaves in course |
Negative punishment | Reduce or remove a pleasant stimulus | Behaviour is weakened | Taking away a teen's computer after he or she misses curfew |
Reinforcement, either positive or negative, works by increasing the likelihood of a behaviour. Penalty, on the other paw, refers to whatever issue that weakens or reduces the likelihood of a behaviour. Positive punishmentweakens a response past presenting something unpleasant after the response, whereas negative penalisationweakens a response by reducing or removing something pleasant. A child who is grounded after fighting with a sibling (positive punishment) or who loses out on the opportunity to get to recess later on getting a poor form (negative penalty) is less likely to echo these behaviours.
Although the distinction betwixt reinforcement (which increases behaviour) and penalisation (which decreases it) is usually articulate, in some cases it is hard to determine whether a reinforcer is positive or negative. On a hot twenty-four hours a absurd cakewalk could be seen as a positive reinforcer (because it brings in cool air) or a negative reinforcer (because it removes hot air). In other cases, reinforcement can be both positive and negative. One may smoke a cigarette both considering it brings pleasance (positive reinforcement) and considering it eliminates the craving for nicotine (negative reinforcement).
It is likewise important to note that reinforcement and punishment are not simply opposites. The employ of positive reinforcement in irresolute behaviour is almost always more effective than using punishment. This is because positive reinforcement makes the person or animal experience better, helping create a positive human relationship with the person providing the reinforcement. Types of positive reinforcement that are constructive in everyday life include exact praise or approval, the awarding of condition or prestige, and directly financial payment. Punishment, on the other hand, is more than likely to create only temporary changes in behaviour considering it is based on coercion and typically creates a negative and adversarial relationship with the person providing the reinforcement. When the person who provides the punishment leaves the situation, the unwanted behaviour is likely to return.
Creating Complex Behaviours through Operant Workout
Perhaps you recall watching a movie or being at a show in which an animal — possibly a dog, a equus caballus, or a dolphin — did some pretty amazing things. The trainer gave a command and the dolphin swam to the bottom of the pool, picked up a ring on its nose, jumped out of the h2o through a hoop in the air, dived again to the lesser of the puddle, picked upwards another ring, and and so took both of the rings to the trainer at the edge of the pool. The animal was trained to do the flim-flam, and the principles of operant conditioning were used to train it. Only these complex behaviours are a far cry from the simple stimulus-response relationships that we accept considered thus far. How can reinforcement be used to create circuitous behaviours such as these?
Ane way to aggrandize the use of operant learning is to modify the schedule on which the reinforcement is applied. To this point nosotros have only discussed a continuous reinforcement schedule, in which the desired response is reinforced every time information technology occurs; whenever the canis familiaris rolls over, for instance, it gets a biscuit. Continuous reinforcement results in relatively fast learning but too rapid extinction of the desired behaviour once the reinforcer disappears. The problem is that because the organism is used to receiving the reinforcement afterwards every behaviour, the responder may requite upwards quickly when it doesn't appear.
Virtually real-globe reinforcers are not continuous; they occur on a fractional (or intermittent) reinforcement schedule — a schedule in which the responses are sometimes reinforced and sometimes not. In comparing to continuous reinforcement, partial reinforcement schedules atomic number 82 to slower initial learning, but they also atomic number 82 to greater resistance to extinction. Because the reinforcement does not appear after every behaviour, information technology takes longer for the learner to make up one's mind that the reward is no longer coming, and thus extinction is slower. The iv types of partial reinforcement schedules are summarized in Table eight.2, "Reinforcement Schedules."
[Skip Table] | ||
Reinforcement schedule | Caption | Real-world case |
---|---|---|
Fixed-ratio | Behaviour is reinforced later a specific number of responses. | Factory workers who are paid according to the number of products they produce |
Variable-ratio | Behaviour is reinforced after an average, but unpredictable, number of responses. | Payoffs from slot machines and other games of take a chance |
Stock-still-interval | Behaviour is reinforced for the offset response later on a specific amount of time has passed. | People who earn a monthly salary |
Variable-interval | Behaviour is reinforced for the showtime response after an average, but unpredictable, corporeality of time has passed. | Person who checks electronic mail for messages |
Partial reinforcement schedules are determined by whether the reinforcement is presented on the basis of the fourth dimension that elapses between reinforcement (interval) or on the footing of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule. In a fixed-interval schedule, reinforcement occurs for the offset response made later on a specific corporeality of time has passed. For instance, on a one-minute fixed-interval schedule the animal receives a reinforcement every minute, assuming it engages in the behaviour at to the lowest degree once during the minute. As yous can see in Figure 8.6, "Examples of Response Patterns by Animals Trained nether Different Partial Reinforcement Schedules," animals nether fixed-interval schedules tend to slow downward their responding immediately after the reinforcement but then increase the behaviour again equally the time of the adjacent reinforcement gets closer. (Near students written report for exams the same way.) In a variable-interval schedule, the reinforcers appear on an interval schedule, merely the timing is varied effectually the average interval, making the bodily appearance of the reinforcer unpredictable. An example might be checking your email: you lot are reinforced past receiving messages that come, on boilerplate, say, every 30 minutes, but the reinforcement occurs only at random times. Interval reinforcement schedules tend to produce slow and steady rates of responding.
In a fixed-ratio schedule, a behaviour is reinforced later a specific number of responses. For instance, a rat'due south behaviour may be reinforced later on it has pressed a central xx times, or a salesperson may receive a bonus after he or she has sold 10 products. Every bit yous can see in Effigy 8.6, "Examples of Response Patterns by Animals Trained under Different Partial Reinforcement Schedules," once the organism has learned to deed in accordance with the fixed-ratio schedule, it will break but briefly when reinforcement occurs before returning to a loftier level of responsiveness. A variable-ratio scheduleprovides reinforcers afterward a specific but boilerplate number of responses. Winning money from slot machines or on a lottery ticket is an instance of reinforcement that occurs on a variable-ratio schedule. For instance, a slot machine (meet Figure eight.7, "Slot Car") may be programmed to provide a win every 20 times the user pulls the handle, on average. Ratio schedules tend to produce loftier rates of responding because reinforcement increases equally the number of responses increases.
Circuitous behaviours are also created through shaping, the process of guiding an organism'due south behaviour to the desired outcome through the use of successive approximation to a last desired behaviour. Skinner made all-encompassing employ of this procedure in his boxes. For instance, he could train a rat to printing a bar two times to receive food, past get-go providing food when the animal moved most the bar. When that behaviour had been learned, Skinner would begin to provide food only when the rat touched the bar. Further shaping limited the reinforcement to simply when the rat pressed the bar, to when information technology pressed the bar and touched it a 2nd time, and finally to but when it pressed the bar twice. Although information technology tin take a long time, in this fashion operant conditioning tin can create chains of behaviours that are reinforced just when they are completed.
Reinforcing animals if they correctly discriminate betwixt similar stimuli allows scientists to exam the animals' ability to learn, and the discriminations that they tin can make are sometimes remarkable. Pigeons have been trained to distinguish between images of Charlie Brown and the other Peanuts characters (Cerella, 1980), and between different styles of music and art (Porter & Neuringer, 1984; Watanabe, Sakamoto & Wakita, 1995).
Behaviours tin also exist trained through the apply of secondary reinforcers. Whereas a master reinforcer includes stimuli that are naturally preferred or enjoyed by the organism, such equally food, h2o, and relief from pain, a secondary reinforcer (sometimes called conditioned reinforcer) is a neutral event that has go associated with a primary reinforcer through classical conditioning. An example of a secondary reinforcer would be the whistle given by an brute trainer, which has been associated over fourth dimension with the primary reinforcer, food. An example of an everyday secondary reinforcer is money. Nosotros enjoy having money, not then much for the stimulus itself, merely rather for the primary reinforcers (the things that money can buy) with which it is associated.
Central Takeaways
- Edward Thorndike developed the police of effect: the principle that responses that create a typically pleasant outcome in a particular situation are more probable to occur once again in a like state of affairs, whereas responses that produce a typically unpleasant outcome are less probable to occur again in the state of affairs.
- B. F. Skinner expanded on Thorndike's ideas to develop a set up of principles to explain operant conditioning.
- Positive reinforcement strengthens a response by presenting something that is typically pleasant after the response, whereas negative reinforcement strengthens a response past reducing or removing something that is typically unpleasant.
- Positive punishment weakens a response by presenting something typically unpleasant subsequently the response, whereas negative punishment weakens a response past reducing or removing something that is typically pleasant.
- Reinforcement may be either fractional or continuous. Fractional reinforcement schedules are adamant by whether the reinforcement is presented on the basis of the fourth dimension that elapses between reinforcements (interval) or on the basis of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (stock-still) or unpredictable (variable) schedule.
- Complex behaviours may exist created through shaping, the process of guiding an organism's behaviour to the desired outcome through the use of successive approximation to a final desired behaviour.
Exercises and Critical Thinking
- Give an example from daily life of each of the following: positive reinforcement, negative reinforcement, positive punishment, negative punishment.
- Consider the reinforcement techniques that y'all might use to train a domestic dog to grab and retrieve a Frisbee that you throw to it.
- Lookout man the following two videos from current goggle box shows. Can you decide which learning procedures are being demonstrated?
- The Office: http://world wide web.break.com/usercontent/2009/11/the-role-altoid- experiment-1499823
- The Big Blindside Theory [YouTube]: http://www.youtube.com/watch?v=JA96Fba-WHk
References
Cerella, J. (1980). The dove's analysis of pictures.Blueprint Recognition, 12, 1–half dozen.
Kassin, S. (2003). Essentials of psychology. Upper Saddle River, NJ: Prentice Hall. Retrieved from Essentials of Psychology Prentice Hall Companion Website: http://wps.prenhall.com/hss_kassin_essentials_1/15/3933/1006917.cw/index.html
Porter, D., & Neuringer, A. (1984). Music discriminations past pigeons.Journal of Experimental Psychology: Animal Beliefs Processes, 10(2), 138–148.
Thorndike, E. 50. (1898).Animal intelligence: An experimental study of the associative processes in animals. Washington, DC: American Psychological Association.
Thorndike, E. L. (1911).Animate being intelligence: Experimental studies. New York, NY: Macmillan. Retrieved from http://www.archive.org/details/animalintelligen00thor
Watanabe, S., Sakamoto, J., & Wakita, M. (1995). Pigeons' bigotry of painting by Monet and Picasso.Journal of the Experimental Analysis of Behaviour, 63(2), 165–174.
Prototype Attributions
Effigy 8.5: "Skinner box" (http://en.wikipedia.org/wiki/File:Skinner_box_photo_02.jpg) is licensed under the CC By SA 3.0 license (http://creativecommons.org/licenses/by-sa/3.0/act.en). "Skinner box scheme" by Andreas1 (http://en.wikipedia.org/wiki/File:Skinner_box_scheme_01.png) is licensed under the CC BY SA 3.0 license (http://creativecommons.org/licenses/by-sa/3.0/deed.en)
Figure viii.vi: Adapted from Kassin (2003).
Figure 8.7: "Slot Machines in the Hard Stone Casino" by Ted Murpy (http://commons.wikimedia.org/wiki/File:HardRockCasinoSlotMachines.jpg) is licensed nether CC By 2.0. (http://creativecommons.org/licenses/by/2.0/deed.en).
Source: https://opentextbc.ca/introductiontopsychology/chapter/7-2-changing-behavior-through-reinforcement-and-punishment-operant-conditioning/
0 Response to "A Is Any Stimulus That Increases the Probability That a Preceding Behavior Will Occur Again."
Post a Comment