Hi, Kathleen; here is an article we wrote for Corally Burmaster. It
summarizes our stand on using ratios. You might post it for people. We
have had a zillion requests for it.
Bob
RATIOS, SCHEDULES – WHY AND WHEN
Robert E. and Marian Bailey, Ph.D.
One of the most common questions and discussion subjects on our e-mail in the past 60 days or so has been the problem of ratios – schedules of reinforcement, variable ratios (VR), variable schedules of reinforcement (VSR) – vs. continuous reinforcement (CRF). This subject has also been a recurring problem, going back as long as we have been receiving and sending e-mail, and farther back than that – even back to the “good old” days when we were training animals and educating our own trainers at Animal Behavior Enterprises. So let’s see what we can make clear first, what we are all talking about, and second, what is our own (the Baileys’) philosophy, practices, and advice about schedules of reinforcement. By the way, it seems to me that most of the correspondents are using VSR and VR in an identical manner, both meaning VARIABLE RATIO. This little article is a distillation of some recent e-mail discussions.
The correspondence we are referring to has to do with SCHEDULES OF REINFORCEMENT. Simply put, a schedule of reinforcement is ANY plan or system for presenting a reinforcer for a given response, according to ANY time interval, such as reinforcing a response every two minutes (creating an INTERVAL schedule), or ANY position of a response in a series – reinforcing every second response, that is, a “two-fer,” expressed as FR 2:1 (FIXED RATIO of two responses for each reinforcer); every third response, a “three-fer,” or a FR 3:1 ratio, every tenth (FR 10:1), hundredth (FR 100:1), and so on, response. If you VARY the interval then you have created a VARIABLE RATIO, the most commonly used in training almost any behavior, and one of the most useful.
The simplest schedule, and one that trainers should all begin with in training ANY response, is a ratio of reinforcing EVERY desired response. This is a ratio of 1 response for 1 reinforcer, 1:1, or CONTINUOUS REINFORCEMENT, abbreviated CRF (to prevent confusion with CR, which is an abbreviation for CONDITIONED RESPONSE, or CONDITIONED REFLEX).
Any schedule other than CRF calls for what we call DIFFERENTIAL REINFORCEMENT, reinforcing some responses and not others. DIFFERENTIAL REINFORCEMENT also is used in forming DISCRIMINATIONS, as in scent discriminations, where the trainer reinforces the response to the scented article and not to the others; it is also a part of SHAPING, where a trainer reinforces the responses that meet his or her criteria – that is,
the response is straight enough, fast enough, properly executed in every way – and extinguishes the responses that do not meet the criteria.
Third, there are other schedules, those that specifically involve time.
They are used less frequently, but useful in their place. One is what we
call a FIXED or VARIABLE DURATION schedule, where the trainer asks for a
response to hold or continue for a certain period of time – for example,
asking a dog to hold a point, or a prone position (“stay”) for 30 seconds.
There are also FIXED OR VARIABLE INTERVAL schedules. We will not say much about time schedules. They can be tough to implement. The introduction of
time as a variable can give the animal an opportunity to do things OTHER
than what you want the dog to do, yet still respond according to
specifications in time. Suppose you have asked the dog, on a FI 5min.
schedule, to jump up to a spot on the wall every 5 minutes. This FI 5min.
schedule means that you reinforce the first correct response, after the
interval is up. Now, just think of all of the mayhem the dog can create in
the five minutes! After the five minutes is up, the dog must still jump up
correctly to get its reinforcement, but it might have made many other
responses, all of which will gain SOME strength from the last reinforcement
for the jump. Well, enough of interval schedules. They have little place
in most training programs.
The list of schedules can go on and on. For instance, you can also use a
schedule of DIFFERENTIAL REINFORCEMENT OF FAST behavior, abbreviated DRF, where the trainer reinforces only the responses that are rapidly executed.
You would be right here in thinking these last schedules we described are
the same as SHAPING SCHEDULES. Scientists have invested entire careers
playing with schedules, and their effects on learning and behavior.
Let us begin by clearly stating our own philosophy about continuous vs.
ratio schedules: This philosophy can be quite simply put: IF YOU DO NOT
NEED A RATIO, DO NOT USE A RATIO. Or, in other words, stick to continuous reinforcement unless there is a good reason to go to a ratio. We think
most of you will accept that we have been involved in shaping a LOT of
behavior. Much of that behavior, we got, and used, without resorting to
ratios. Many animals we used for a decade or more ON CONTINUOUS
REINFORCEMENT. We benefited from the time not lost establishing a ratio
when it was not necessary. Thus we recommend that you consider giving it a
try.
Well, what are the relative advantages of continuous reinforcement (CRF),
ratio (FR or VR), or interval (FI or VI) schedules? Why and when would we
use them?
What are the advantages of CRF? When and why should we reinforce every
response of a certain type, say, a proper SIT? First of all, the only way
you can be sure that each response will be “proper,” that is, that it will
meet your criteria, is to reinforce EACH AND EVERY RESPONSE that is proper,
correct according to your own criteria. If each correct response is NOT
reinforced, and you start with a ratio, even a “two-fer,” you are apt to
allow less than perfect responses to acquire strength from that final
reinforcer after the second response. Let’s say you decide to try for
two-fers. You tell the dog SIT – the first response is a bit sloppy, the
second one is OK. You click and treat. What have you reinforced? — a
sloppy response, chained to a good response. The sloppy one automatically
acquires some strength from the final reinforcer.
Hence, our rule No. 1 is IF YOU DON’T NEED A RATIO, DON’T USE A RATIO. If you decide you NEED a ratio, then our rule No. 2 is keep a response on CRF until it is just what you want, on cue, with good fast reaction (low
latency), and you have given MANY (not just five or ten) reinforcers for
the perfect sit – dozens, dozens, of times. DO NOT BE IN A HURRY TO GO TO
A RATIO. You should also “proof” the behavior in many different
circumstances, different locations, different audiences, under many
distracting conditions, ALL on CRF. Then you can say your SIT is as good
as you want it to be, the dog knows when to do it and how fast to do it.
The behavior is now strong and reliable.
In what follows about “two-fers” and other ratios, we do NOT wish to appear
that we are down-grading the advice of experienced trainers, and we
certainly do not want our comments to be taken personally. There are, as
we will note below, reasons that this practice has come so easily into the
advice and the handbooks for training. We simply wish to give the readers
here the benefit of our own experience, which runs as follows: In most
situations where dogs are being trained as pets, almost never would there
be a strong need for ratios. However, as far as we can tell from advice
given to newbies, ratios have become de rigeur in training. As nearly as
we can tell, the “ritual” of the two-fer is widespread. It has gone
inevitably into the practices and the literature of many good trainers,
because it was believed to be a necessary step for building up any
resistance to extinction and rapid performance. For example, in a recent
Clicker Journal, a very respected trainer recommends getting the behavior,
and then, before moving to new locations and other fluency building
exercises – starting with TWO-FERS! In videotapes, in recent manuals,
almost everywhere, TWO-FERS! The early use of ratios verges on dogma.
There may be an occasional need to give such unquestioned advice to clicker
NEOPHYTES who might be prone not to reinforce behavior enough times to get
it strong enough. However, as experienced trainers, LOOK CAREFULLY AT WHAT
YOU ARE DOING, and weigh the disadvantages of losing precision, and the
loss of time. As always, the choice is yours. Just be sure that you know
you have a choice.
Now, when SHOULD we use a ratio schedule? Remember our RULE NUMBER ONE – RATIOS ONLY WHEN NECESSARY. Once you have decided that you need a ratio, then the answer is a) “After the behavior is as PERFECT as you want it to be, or as perfect as you are able to get it, within reason.” And b) “if you want to establish with that behavior a high resistance to extinction” – for example, if you expect to be using it in some context where you cannot reinforce, c) “if you want the behavior to occur at a rapid rate (responses per time interval) without reinforcement,” or possibly, d) if you are working the dog with food reinforcers and you do not want the dog to fill up too rapidly.
There is no question that a variable ratio is the best one to use if you
need, or want, a VERY persistent behavior without reinforcement. Just look
at the number of times a really “hooked” fisherman will cast out a bait
without being reinforced. And, as one of our e-mail correspondents has
noted, “according to Skinner, compulsive gambling occurs partly because
people become hooked by the variable ratio. The very next response may pay
off regardless of how long it has been since the last response paid off, so
the gambler keeps responding.” Quite true. But how many times in your
life with your dog do you run into the conditions of a), b), c) or d)?
Certainly you probably want a reasonable resistance to extinction, and
certainly a reasonable rate of response. And, indeed, “this schedule (a
variable ratio) provides greater incentive to resume responding right after
receiving a reinforcer than does the fixed-ratio schedule.”
Probably one of the best examples of when to use a VSR is the case of Ham,
the chimpanzee astronaut trained by Joe Brady’s group for NASA. Ham was
sent into space in the early 1960s, before the Mercury astronauts. Ham was
taught to make discriminations and complex responses to certain stimuli,
such as flashing lights and special sounds. There was concern that the
food dispensing equipment might not work too well in weightless space. For
that reason, and for other good reasons, it was decided to build up Ham’s
responses such that he could work the entire mission ON EXTINCTION. Ham’s
responses were built up to THOUSANDS of responses per reinforcement. One
time Ham might be reinforced after a hundred responses, the next time it
might be a thousand. Now that, my friends, is a RATIO! If you are
preparing to blast your doggie into space, and you want to make sure that
it keeps on working, VSR is definitely the way to go.
We used a VDS (Variable Duration Schedule) with our automated dancing
chicken unit. When a person dropped a quarter (a nickel in the early
’50’s) in a coin box, a door opened and released the chicken into the
performing area. The chicken walked over to a simulated juke box, pulled a
loop, which started music playing, and the chicken stepped onto a platform.
In the center of the platform was a photocell. When the chicken broke the
light stream hitting the photocell, that started a timing mechanism (we
used a dipper circuit that charged a capacitor, for the electronically
literate). Now, because the chicken was what it was, the chicken had to do
something other than stand still, so most chickens scratched, which looked
like a dance. While it scratched about, it moved into and out of the light
beam in a rather unpredictable fashion. This varied the amount of time
before the equipment said “enough” and fired the electric feeder. In
addition, just in case, we also placed a device in the circuitry (a
variable tap on the capacitor for you electronic types) that more or less
randomly changed the criteria for firing the feeder. So, we had two
methods of determining the VDS, one method depended on the behavior of the
chicken and one was independent of the chicken. The up-shot of this system
was a chicken that danced from 8 to 22 seconds. .As you can see, when we
say VARIABLE, we mean just that.
Our piano playing duck (and the variant, the PICKIN PEKIN guitar playing
duck) were based on VSR. As the duck played the keys up and down, there
were microswitches being triggered by “hot” keys. In the old days we used
stepping switches and later we used solid state decade counters to keep
track of how many keys had been struck. By various means we then more or
less randomly selected a number of keys that must be struck to fire the
feeder (usually used a ring counter, or a variant thereof). The duck ended
up striking somewhere between 13 and 25 “hot” keys. What the patron heard
was TWINKLE, TWINKLE, LITTLE STAR, because we also programmed the output
into a recognizable tune. Some people actually thought the duck was
playing a tune. No wonder they can sell so much ocean beach front property
in Arizona.
How important are schedules of reinforcement. Most of the time, especially
in pet training, not terribly. That does not mean that they are
insignificant to animal training. We had a coin operated unit (probably
our most famous) called BIRD BRAIN. BIRD BRAIN played tic tac toe. The
person had the opportunity to test his or her skill against the chicken
(the chicken does get a little help). When we designed the control
circuitry for BIRD BRAIN, we allowed for reinforcement at the end of the
game, meaning that the chicken would usually play three, four or five times
before the feeder fired. We knew that there would ordinarily never be a
chance for the chicken’s first move to be reinforced. We also knew from
experience that a certain percentage of the birds (we guessed about 25
percent or one out of four birds) would have problems starting the game
because THE FIRST PECK, OR MOVE, NEVER PAID OFF. In anticipation of this
problem, we incorporated what we called a FEED FIRST CYCLE switch that
reinforced the birds after the first, or starting peck. Well, we were
almost right it was one out of three birds or 33 percent. Those
afflicted birds would simply pace back and forth in front of the cage,
approach the switch panel and lights, and then back away and pace some
more. It might repeat this behavior several times before finally giving
the proper response. By turning on the FEED FIRST CYCLE switch, that delay
behavior (delaying reinforcement, of course) would suddenly disappear after
a few pecks had been reinforced at the beginning of the performance.
Sounds strange, doesn’t it. When Skinner played our little game the first
time (at a scientific conference in the late 70’s) he was intrigued by the
game, and very much impressed that the technology had come so far that we
could PREDICT from the reinforcement schedule how certain birds would
respond. We told him it was because we had to make a buck at it that we
knew it so well. He enjoyed the joke, but he understood that it was only
partially in jest.
I have not discussed any of our free environment stuff seagulls,
dolphins, dogs, cats, ravens, vultures, etc. Most of that work combines
desensitization (the really tough part) with some rather exotic VDS and VSR
schedules. Some of the seagulls and dolphins were on excursions lasting
hours. That meant that some trips might last for only a half hour and
others might go on for much longer. Some of the dolphin excursions lasted
the entire day, meaning only ONE trial. As shown in PATIENT LIKE THE
CHIPMUNKS, the animals did things once they got to the target area, but,
in my opinion, the getting there was always the hard part. The animals
rather quickly mastered most of the terminal maneuvers, even the tough
ones. By the way, as difficult as the terminal behaviors were, they were
almost always on a CRF schedule continuous reinforcement, even though
that reinforcement might be many minutes or even hours away. I hope I have
made our position on schedules of reinforcement clear. We use the simplest
schedule that works.
There are those that say a CRF schedule cannot yield ANY strength. Well,
our answer would be the WE found it good enough for some excellent behavior
over the years. Besides, In our contacts with both experienced and
neophyte dog trainers, we found most were in such a hurry that they seldom
used enough reinforcements on a CRF schedule to both sharpen and then
strengthen behavior. Some say they NEVER reinforce the same behavior more
than a few times, and that’s it. – a direct quote from a forum post – “I
have never asked for a behavior with no changes 20 times in a row, is
there a point to doing that?” (Others in this journal have talked of
fluency, so we won’t go into that here.)
Then, with a partially trained behavior, they go to a ratio of whatever (and, for the sake of this discussion, it is irrelevant if it is VR, FR, or ?). There is usually mentioned something about boredom or the dog quits doing the behavior or ? First, in our collective experience (and this is essentially 100 years) neither of us have experienced in our training programs a bored dog, dolphin, gull, raven, elephant, aardvark, pangolin, lion, bear, squid, fish, or ???????? Next, we have had dogs performing the exact same identical behavior over 800 times in one day, and repeated that for more than a week. We did similar tests with dozens of other kinds of animals. NONE OF THESE ANIMALS WERE ON A RATIO! Well, they were on a continuous ratio, if you want to split hairs. We did NOT find the behavior
in these animals to be frail. The behaviors did not evaporate when the animal was asked to perform several trials with no reinforcement. Were the
behaviors as persistent as they would have been under a VR schedule
training program? No, of course not. But, if the animal would have
performed the behavior very well for 10 times without reinforcement, would
that not be sufficient for most tasks? How often do you need an animal to
perform a behavior 100 or 1,000 times without reinforcement of any kind, food, social contact, or the opportunity to perform another behavior, or whatever?
How did the myth of frail CRF behavior find its way into the fabric of dog training? There are many possibilities. Perhaps in the last 10 to 15 years, prominent teachers of clicker training found that many trainers were working with such weak behavior that it fell apart when there was the least amount of stress, or if trainers failed to maintain some reinforcement. The teachers may have, quite logically, solved that problem by concentrating on STRENGTH OF BEHAVIOR EARLY. They accepted some diminution in the power to shape that is a consequence of training on a ratio. WE ARE NOT QUARRELING WITH SUCH A COMPROMISE. The teachers deserve the credit for introducing the man/woman on the street to the technology. We are just pointing out that many dog trainers are accepting on blind faith that TWO-FERS ARE THE AUTOMATIC AND ONLY WAY TO GO. THIS IS NOT TRUE. We do not want to have this myth woven into the fabric of most dog trainers’ understanding of operant technology.
Look at it this way, perhaps, life is complicated enough without our making it more complicated. CRF is simpler than VSR. CRF works. We like simple.
Bob & Marian Bailey
Hot Springs, Arkansas