The subject of timecodes and their use in video has occasioned much confusion. This document is a step to at least developing a common vocabulary with which to have discussions.
In an ideal world, all video would play back at a uniform round-number frame rate, for example 30 frames per second. Associated with each frame of video would be a unique frame count, starting with zero. Each frame count could be translated into a unique timecode format of hours, minutes, seconds, and frames. Each frame count could likewise be translated into a (possibly fractional) time in seconds. All of these descriptions of a frame's temporal location would be mutually interconvertible and their meanings unambiguous and intuitively simple.
For example, at 30 frames per second:
108,000 frames = 1 hour + 0 minutes + 0 seconds + 0 frames = 3600 seconds.
When black & white TV was first introduced in the US, it played at 30 frames per second. The AC power available at every wall outlet alternates at a rate of 60 cycles per second, providing an easily available sync signal. In Europe, the AC power oscillates at 50 cycles per second, hence their adoption of 25 frame per second video rates.
The conversion between times based on 25 or 30 frames per second is relatively simple. If that were the only timecode problem facing video producers this document would be unneccesary. With the development of color TV, however, the situation became more complicated. The added color component to the broadcast TV signal could sometimes interfere with the preexisting audio. Changing the audio's format would have made all existing black & white TVs incompatible. The solution was to nudge the picture rate down from 30 to 29.97 frames per second. Today, all TVs in the US, Canada, Mexico, and Japan play at this rate. This color TV format is named after the committee that defined it: the National Television System Committee, or NTSC.
As if to add to the confusion, many people refer to 29.97 frame per second video as being "30" frames per second. Many professionals do this as a verbal shorthand; many non-professionals do not understand the difference. It is my impression that a non-zero number of professionals may fall into this latter category as well. It is important to make clear: all NTSC video is intended for playback at 29.97 frames per second. All NTSC televisions are designed to play at 29.97 frames per second. If they are receiving broadcast video, they will synchronize to the broadcast signal, which will be 29.97 exactly. In playing consumer videotapes they will use an internal clock which should be set to 29.97. If they are cheap or out of adjustment this may vary, but not because of any deliberate intent to play at any rate other than 29.97 frames per second. There are no 30 frame per second TVs.
A frame-counting scheme called drop-frame has been developed that will allow users to ignore the distinction between 29.97 and 30 when interpreting timecodes. The details will be discussed below, but the central idea is important:
Drop-frame timecodes are defined so as to look like 30 frame per second times, and to reflect accurately the actual time elapsed. In particular, time durations found by subtracting drop-frame timecodes will be accurate to within one or two frames over arbitrary length intervals.
We will always express frame rates in frames per second to indicate the connection between the word "per" and the operation of division. Similarly we talked above about AC power varying at 60 cycles/s. The generic unit for "things [of any sort] per second" is Hertz, abbreviated Hz, and named after the German physicist Heinrich Hertz. Hertz was the first to generate and detect radio waves, and demonstrate that they were the same sort of thing as light, but oscillating at lower frequencies. Many technical discussions will use Hz to describe frame rates.
First, let us make clear how we are to interpret the idea of video being played at a fractional frame rate. We do not mean that frames are somehow sliced up like so many onions, with that extra 0.03rd of a frame being discarded every second. Rather, it means that the time between frames is 1/29.97 seconds, or 0.03336666... seconds, which is a bit longer than 1/30 = 0.03333... seconds. (Notice this reversal of relationship: larger frame rates mean smaller time intervals. In a conversation where one person is talking about frequency and the other about time, it is easy to get out of sync.) Another way to look at it without having to look at reciprocals is to say that 29.97 frames/s means that 2,997 full frames are presented in 100 seconds. Both these viewpoints are very useful.
Let us imagine that we have a program of video in which every frame is just the image of a number, which is the frame number. So the first frame is an image of the number zero, the next is of the number one, etc. This program is played back at 30 frames/s and at 29.97 frames/s (see Fig. 1).
A given frame at 29.97 comes later than the same one at 30. Note that the two playbacks immediately get out of step. Recall our example above of the conversion of the duration of one hour into other formats, and compare it with the results shown in Fig. 2.
After one hour, the 29.97 frame/s video is 108 frames behind the 30 frame/s video.
The discrepancy between the 29.97 and 30 frame/s video is not surprising or mysterious. But problems arise when we try to calculate timecodes from frame counts with 29.97 video. Timecodes tell you the number of hours, minutes, seconds and frames that have elapsed since the start of the video playback. These numbers are separated by colons (':'), except for seconds and frames, which are separated by one of four characters; we will use a period ('.') for now.
The most obvious way to count timecodes is to increment the frame count as each frame of video goes by. When the frame count gets to 29, at the next frame we set it to zero again and add one to the second count, like so:
00:00:00.00 00:00:00.01 00:00:00.02 ... 00:00:00.28 00:00:00.29 00:00:01.00 00:00:01.01(etc.)
This approach works just fine for 30 frames/s material (of which there is none in the video world!) but causes problems for 29.97 frames/s video, as illustrated in Fig. 3.
The problem, of course, is that adding one second to the timecode every 30 frames is not correct for 29.97 frame/s playback. But what can we do? We can't add one second every 29.97 frames, or 1/29.97 second every frame. The solution is essentially to let the timecode drift out until it is incorrect by a full frame. At this instant the timecode will be too small. At the next frame we add two frames to the timecode instead of one, so that the timecode correctly describes the actual elapsed time. This is like the addition of February 29 every four years, to keep the calendar from slipping with respect to the seasons.
As you may know, leap year does not always come every fourth year. In years that are divisible by 400, leap day is omitted. Similarly, the actual solution for 29.97 timecode, called drop frame counting, is a bit more complicated than the approach described in the previous paragraph, but it achieves the same result: drop frame timecodes accurately describe the true time elapsed. (Fig. 4.)
Keep in mind that frames are dropped from the counting, not from the video itself. Drop frame counting is purely a "bookkeeping" measure that keeps the timecodes closely in sync with real time elapsed. To make clear that drop frame counting is being used, a comma (',') or semicolon (';') is used to separate seconds and frames. A period ('.') or colon (':') is used for non-drop counting. The distinction between the two options for either counting method parallels the distinction between fields and frames. Perhaps another section of this document will someday discuss that difference.
The actual way that drop frame counting is implemented is illustrated by the following sequence of timecodes:
00:00:00,00 00:00:00,01 ... 00:00:59,28 00:00:59,29 00:01:00,02 00:01:00,03(etc.)
Every minute, two frames are dropped from the count. At this rate, every hour we would be dropping 120 frames. But as we saw above, the discrepancy after an hour is actually 108 frames. So, when the minute count is divisible by 10 (0, 10, 20, ... , 50) the two frame counts are not dropped. This scheme keeps the timecode from drifting away from the true time.
What does this mean for the person trying to do math with timecodes in drop frame format? For most calculations, you can treat the timecodes as if they were exactly accurate. You can add and subtract them in the ordinary way, using 30 frames per second, 60 seconds per minute and 60 minutes per hour. The answers you get will be accurate to within a frame or two over any length time interval. For very precise calculations it is best to convert the timecode into a correct frame count, and then convert the frame count into seconds. For this purpose the following equations give the exact values.
totalMinutes = 60 * hours + minutes frameNumber = 108000 * hours + 1800 * minutes + 30 * seconds + frames - 2 * (totalMinutes - totalMinutes div 10)where div means integer division with no remainder.
D = frameNumber div 17982 M = frameNumber mod 17982 frameNumber += 18*D + 2*((M - 2) div 1798)
(If -2 div 1798 doesn't return 0, you'll have to special-case M = 0 or 1.)
frames = frameNumber mod 30 seconds = (frameNumber div 30) mod 60 minutes = ((frameNumber div 30) div 60) mod 60 hours = (((frameNumber div 30) div 60) div 60) mod 24where mod means the remainder after integer division.
timeInSec = frameNumber/29.97; frameNumber = seconds * 29.97
Note that the first frame is number zero in this scheme. Similary, the time of frame zero is zero. If you want to know how long an interval is, you have to add 1 to the frame number (to get the frame count) before dividing by 29.97. Don't let off-by-one errors drive you crazy!