Windows EDW (WEDW) is a fundamentally new program which attempts to provide similar functionality to the Unix/DOS version (EDW), but with a very different user interface. WEDW functions in a way that is most similar to EDW's "mark" mode, i.e., a cursor is always present and single-key segment marking and window positioning functions are always active. WEDW replaces EDW's command line interface with menu items for the selection of optional settings, extended mouse functions for segment label manipulation, and dialog boxes where text input is necessary. WEDW retains some of the appearance of EDW in that a waveform display region is always present while spectrogram and pitch marking windows can be toggled on and off as desired. Both EDW and WEDW read and write waveforms in an extended RIFF (Microsoft .WAV) format that includes waveform segment definitions and both are also able to read an older .WAV format that was the original format used by EDW. WEDW is still under development with several planned features presently inactive.
This picture shows the screen appearance of WEDW when a waveform and spectrogram window are open. A waveform display window is always present, spectrogram window (as shown here) and a pitch marker window (not shown) may be toggled on and off as desired. The overall WEDW window can be resized using the mouse to click and drag a window corner as with most windows. The horizontal scale of all windows and the vertical scale of the waveform display window will be adjusted to the new overall window size. However, the vertical size of the spectrogram window is a constant number of pixels as determined by a settable parameter.
In addition to the standard window controls, features
to note are:
All these features are described below in more detail.
Any time the mouse is within one of the graphic display windows, the following keys are bound to events as follows (case is ignored):
Note: when either the B or E marker of a new segment is first entered, both markers appear at the same location. WEDW allows one of the marks to be changed to a new location to generate a segment rather than location marker. Thereafter, if either begin or end marker is to be moved, WEDW will request confirmation before allowing the change.
Note: V and U function only when the pitch marker to modify has been selected by clicking the left mouse button on the marker.
In addition to the standard use of the mouse with pull-down and pop-up menus, WEDW uses mouse events as follows:
The pitch marker window as illustrated in this picture is a small window above the waveform display window in which markers are placed to indicate pitch and other events associated with the waveform. Markers that are associated with pitch periods in voiced regions of the signal have a small arrowhead at their base and are called voiced markers. Other markers may be placed to divide voiceless regions of the signal into smaller epochs of a size similar to a pitch period. These are termed voiceless markers.
When the pitch display is first enabled, WEDW searches the current working directory for a file that has the same base name as the waveform file being viewed but with the extension PPS. If a PPS file is found, it is read and information from the file is used to build the pitch marker display. However, WEDW is also able to estimate pitch marker information directly from the waveform (a process called pitch tracking) and use its estimates for the pitch marker display. Section 8 below provides more details on using the pitch tracking in WEDW.
When the pitch marker window is visible, the following mouse actions are used to edit the information:
Note: At all times, status windows in the toolbar report the time-coordinate of the mouse pointer. When the pointer is in the Pitch or Waveform window, the waveform amplitude is also displayed as an unscaled digital sample value (designated dv units). When the mouse pointer is in the spectrogram window, the status window displays the frequency coordinate corresponding to the vertical pointer location.
Other items may appear in the menus, the following mentions only those which are active.
The spectrogram displays speech information in a time-by-frequency-by-amplitude
format, The time (X) axis is aligned with the waveform window time axis.
The frequency (Y) axis indicates ascending frequency from a low frequency
cutoff value (at the bottom of the display) to a high frequency cutoff
value at the top of the display. Both these cutoff values can be set in
the spectrogram settings dialog box (see below). Finally, sound amplitude
at every time-by-frequency coordinate is displayed in shades of gray with
white being the lowest amplitude and black the highest amplitude. The lowest
and highest amplitude values can be set in the spectrogram settings dialog
box, and also whether to use a linear or logarithmic amplitude scale. A
logarithmic scale tends to bring out lower amplitude features of the spectrogram
better than a linear scale, but a linear scale sometimes brings out formant
frequency patterns better, especially in recordings that have substantial
background noise.
Select "settings..." under the Options Spectrogram menu to bring up the dialog box for adjusting spectrogram parameters. Figure 2 shows this dialog box with its default settings as follows:
To perform pitch tracking in WEDW, the pitch window must be toggled on. This enables the Track item in the Pitch submenu of the Options menu. There are two modes of operation for the pitch tracker. For both modes, pitch tracking is only applied to the portion of the waveform that is visible in the waveform window, and pitch marks, if found, will replace all previously defined pitch marks for the visible portion of the waveform. This makes it possible to use different pitch tracking settings for different parts of the waveform if necessary. In the supervised pitch tracking mode, an example pitch period must be located manually by using the mouse to select a region of the waveform corresponding to one pitch period. This example is then used to seed the pitch tracker which will search the portion of the waveform visible in the waveform window in both directions starting at the location of the example pitch period. For the other, unsupervised, mode of pitch tracking, it is not necessary to manually select a seed; WEDW automatically selects a seed period using waveform amplitude and the F0 Mean value given in the Pitch Settings dialog box. Once a seed is selected either manually or automatically, the pitch tracking algorithm (described below) is the same for both pitch tracking modes. A checkbox labeled Require Seed in the Pitch Settings dialog box determines which method WEDW uses.
By default, the Require Seed box is checked indicating that the region markers delimit a single pitch period for use as the seed. When the Track item is selected in the Pitch menu, WEDW checks to see that the selected region corresponds to a period associated with an F0 between the Min and Max F0 values given in the Pitch Settings dialog box. If this condition is not met, WEDW displays an error box and you must either adjust the markers by reselecting the region, or adjust the values for F0 Min or Max in the dialog box then select pitch tracking from the options menu again. Note that the F0 Min and Max values are used both to screen the seed selection, and to report statistics after pitch tracking has completed. Therefore their values will generally change after each call to the pitch tracker and in some cases, especially after a failure in tracking, they may inherit unrealistic values that would need to be adjusted by hand before a new seed will be accepted.
When the Require Seed box is unchecked, the selected region, F0 Min, and F0 Max are all irrelevant, however, the F0 Mean value is used in estimating the duration of the internally generated seed period. Because of this it is important to set a realistic value in the F0 Mean field before starting unsupervised pitch tracking. As with F0 Min and Max, the Mean value is updated after each call to the pitch tracker to report the mean value obtained for the voiced pitch periods in the waveform. Thus, this field may also need to be reset to a realistic value if the tracker failed to run correctly.
The pitch tracking algorithm in WEDW uses a raised cosine windowed portion of the waveform centered around the onset of the seed period as the initial search template for an adjacent pitch period. The onset is assumed to be a positive-going zero crossing preceding the first (usually strongest) F1 fluctuation in the seed period. The search template is compared to the structure of the waveform within a reasonable range of distances away from the seed period using a correlation statistic. This range is determined by the amount of allowable jitter (period-to-period fluctuations in F0) in successive periods. The location of the subsequent period is taken as the location at which the search template correlates most strongly with the waveform in the region being searched. If this correlation is above a voiced/unvoiced threshold value, it is assumed that another pitch period has been detected and the onset of the new pitch period is windowed and averaged with the previous search template to form a new template that is used in searching for the next pitch period. If the maximum correlation value in the search region is below the voiced/unvoiced threshold, the present location is assumed to be unvoiced and an unvoiced marker is placed at a location corresponding to the average pitch period following the last pitch marker. When an unvoiced region of speech is encountered, the search template is replaced with a windowed inverted sine wave having a period corresponding to the duration of the window. The same algorithm is applied in both directions starting with the seed period to identify all predecessor as well as all successor periods to the seed period.
The pitch settings dialog box allows the parameters of this tracking algorithm to be adjusted to improve the performance of the algorithm with various talkers. The tracking parameters than can be adjusted are displayed in the pitch settings dialog box:
The Label Settings dialog provides control of the grouping of segment boundaries, the font used to display segment labels, and whether, by default, segment boundaries that share a location move together when a segment is moved by dragging the boundary label with the mouse. Generally, when a segment boundary is moved by dragging its associated label with the mouse, only the selected segment boundary changes. However, as a special case, if more than one segment has a boundary at exactly the same location as the selected boundary, all segment boundaries which share the location are moved together. This is especially useful when, for example, the boundary between adjacent phonemes in a labeled wavform is represented simultaneously by the end marker for the earlier segment and the begin marker for the subsequent segment. Logically, it is the boundary comprising both segment markers that one is probably trying to move.
The grouping feature is provided because it can sometimes be difficult to place two boundary markers at exactly the same sample locations. By selecting the Group item in the labels menu segment markers which are quite close, but not exactly overlapped can be automatically adjusted to overlap. This in turn will ensure that they will normally move together when any boundary is dragged.
WEDW provides a way to display special symbols such as IPA phonetic symbols when a font for the symbols is available. This is done by mapping between standard ASCII letters in segment labels and special symbol codes. The mapping interprets strings of alphabetic characters as tokens which can be replaced by characters or character strings from an specific font. For example, one might map the letter 'x' to the character code for schwa in an IPA font, or the sequence of letters 'ae' to the character code for the joined 'ae' character in an IPA font. The mapping itself is read by WEDW from a user-constructed file which specifies the name of the special symbol font and the mappings from input (ASCII label characters) to output (character codes in the symbol set) using the format:
FontName
<input letter>[<input letter>...] <output code>[,<output code>...]
<input letter>[<input letter>...] <output code>[,<output code>...]
IPAPhon |
|
p |
112 |
t |
116 |
k |
107 |
dx |
228 |
b |
98 |
d |
100 |
g |
103 |
q |
214 |
m |
109 |
n |
110 |
ng |
247 |
em |
109,164 |
en |
110,164 |
eng |
247,164 |
For example, Table I shows a portion of a symbol mapping table for a font called IPAPhon. In some cases, single characters map to single codes. This is the case for p, t, k among others in Table I. Sometimes, multiple characters map to a single code as for dx and ng in this example. Sometimes, two or more codes are needed to represent a given input sequence as for em, en, and eng in the example.
When symbol mapping is enabled by specifying the name of a mapping file in the Labels Settings dialogue box, WEDW will break every segment label into one or more string tokens and search the mapping table for a matching token. If a match is found, the token is replaced by the output codes for the token, otherwise, the input token is assumed to correspond exactly to the output sequence. That is, tokens which are not found in the mapping table are displayed (in the symbol font) using the ASCII code of the input character(s). As a result of this strategy, Table I contains a number of single character mappings which are actually unnecessary. In particular, the mappings for p, t, k, b, d, g, m, and n are redundant since the symbol code for these letters is the same as their ASCII code. However, the ASCII letter 'q' maps to the phonetic symbol for a glottal stop and its code does not correspond to the character code for 'q, consequently, that entry is not redundant.
WEDW tokenizes segment labels using a very simple set of rules. All adjacent alphabetic characters (i.e., the letters a-z and A-Z) are assumed to be part of a single token and all other characters are treated as token delimiters. With one exception, delimiting characters are displayed in the output without mapping. The exceptional case is the '-' character which may be used to introduce a diacritic. WEDW allows for the possibility that - introduces diacritics, but it makes no assumptions about the diacritics themselves which must be specified (including the '-') in the mapping table if they are to be mapped. For instance, we use '-n' to mean nasalized and therefor would specify a nasalized schwa with the sequence ax-n. Our mapping table contains the lines:
allowing the sequence ax-n to appear as schwa with a tilde above it in the WEDW mapped display.
As of September 3, 1997 WEDW provides the capability to modify the prosodic structure of speech. This feature is based on and requires pitch marker information and, if the prosodic modifications are to be successful, the pitch marker information must be accurate. When pitch marker information is available, WEDW is able to display either F0 or RMS amplitude data in place of the spectrogram display. All three of these display types (F0, RMS, and Spectrograms) inherit the same screen height parameters and selecting any one of these will cause the lower display window to appear (if it was not present) or to have its contents replaced by the selected display type. The pitch marker window may be present simultaneously with any of these, but it need not be as long as pitch marker data is available (e.g., in a .PPS file) for the waveform being edited. The following figure shows the F0 contour display enabled for the word "abnormal" produced by a male talker. The displayed contour consists of a series of red and blue line segments. Each line segment is equal in duration to the pitch epoch to which it corresponds. Red segments correspond to voiceless epochs while blue segments correspond to voiced epochs. In the figure a green X (called a sketch marker) indicates the start of a possible edit of the contour as described later.
All data in the F0 and RMS contour displays are linked to the pitch marker data such that changing the location of a pitch marker will immediately change the value of F0 and potentially the RMS value associated with the pitch epoch. Changing the location of a pitch marker does not change the speech waveform. Editing the F0 or RMS contour using the line drawing or other features described below does not immediately change either the waveform or the pitch marker data, however, once the F0/RMS changes have been applied, the speech waveform will be modified and the resulting modifications will then be reflected in the pitch marker data.
Prosodic features of duration, F0, and amplitude can be changed in three distinct ways. Two of these depend on selecting a region of the waveform over which modifications are applied. Either F0 or RMS contours can be smoothed (i.e., low pass filtered) by selecting a region of the displayed contour and then selecting smooth in the edit menu. The second method which applies to a selected region allows additive changes to duration, F0, or amplitude. When a region is selected, the global selection under the edit menu brings up a dialog box which reports the duration, average F0, and average RMS amplitude for the selected region. Within the dialog box, any of these values can be changed and the changes applied to the selected region. Duration is changed by replicating or deleting pitch periods from the selected region to approximately achieve the desired duration (note that the actual duration after applying the changes will be within one pitch period length of the requested duration). F0 is changed by altering the duration of each pitch period. Generally, when F0 is changed it is also necessary to add or delete pitch periods to maintain approximately the specified duration, and as a result, duration will also change slightly when F0 is changed. RMS amplitude is changed by increasing or decreasing the amplitude of each pitch period (taken separately) to achieve the requested amplitude. Because there is some interaction between adjacent pitch periods, the resulting amplitude changes are also likely to be only approximately those requested. Moreover, when RMS and duration or F0 are changed simultaneously, these changes interact. F0 changes are applied first, and then duration/RMS changes are applied in an attempt to minimize the consequences of the interaction. Still, this can result in deviations from the specified RMS value, especially when the number of pitch periods in the selected region has changed.
The third method for prosodic modification applies only to F0 and RMS contours since it involves changing the shape of the contour by sketching a new contour. For this method, it is first necessary to enable Line Draw mode (under the Edit menu) and thereafter, displayed values of F0 or amplitude are altered using the mouse. Once the desired contour has been drawn, the specified changes can be applied to the speech waveform by selecting the modify option in the Edit menu. In Line Drawing mode, clicking the left mouse button when the mouse pointer is within the lower window will place a sketch mark (a green X) at the pointer location. Moving the pointer to a new location and pressing the left button again will draw a new F0 or RMS contour by linear interpolation between the present pointer location and the sketch mark, and the sketch mark will then be moved to the present pointer location. Using this method, the desired contour is drawn by piece-wise linear approximation.
As of 9/5/97 -