Published articles on other web sites*

Published articles on other web sites*

Font Metrics in Silverlight


Charles PetzoldMost graphical programming environments have classes or functions to obtain font metrics. These font metrics provide information about the size of text characters when rendered with a particular font. At the very least, the font metrics information includes the widths of all the individual characters and a height that’s common to all characters. Internally, these widths are probably stored in an array, so the access is very fast. Font metrics are invaluable for laying out text in paragraphs and pages.
Unfortunately, Silverlight is one graphical environment that does not provide font metrics to application program developers. If you wish to obtain the size of text prior to rendering it, you must use TextBlock, which is, of course, the same element you use for rendering text. Internally, TextBlock obviously has access to font metrics; otherwise it would have no idea how large the text is supposed to be.
It’s easy to persuade TextBlock to provide text dimensions without actually rendering the text. Simply instantiate a TextBlock element, initialize the FontFamily, FontSize, FontStyle, FontWeight and Text properties, and then query the ActualWidth and ActualHeight properties. Unlike some Silverlight elements, you don’t need to make this TextBlock a child of a Panel or Border. Nor do you need to call the Measure method on the parent.
To speed up this process, you can use TextBlock to build an array of character widths, and then you can use this array to mimic traditional font metrics. This is what I’ll show you how to do in this article.

Is All This Really Necessary?

Most Silverlight programmers don’t mourn the absence of font metrics because, for many applications, TextBlock makes them unnecessary. TextBlock is very versatile. If you use the Inlines property, a single TextBlock element can render a mix of italic and bold text, and even text with different font families or font sizes. TextBlock can also wrap long text into multiple lines to create paragraphs. It’s this text-wrapping feature that I’ve been using in the past two installments of this column to create simple e-book readers for Windows Phone 7.
In the previous issue, I presented a program called MiddlemarchReader that lets you read George Eliot’s novel “Middlemarch” on your phone. I want you to perform an experiment with that program: Deploy a fresh copy on an actual Windows Phone 7 device. (If necessary, first uninstall any version that might be on the phone already.) Now press the application bar button to get the list of chapters. Choose chapter IV. From the first page of chapter IV, flick your finger on the screen from left to right to go to the last page of the third chapter and start counting seconds: “One Mississippi, two Mississippi …”
If your phone is anything like my phone, you’ll discover that paging back from the beginning of Chapter IV to the end of Chapter III takes about 10 seconds. I think we can all agree that 10 seconds is much too long for a simple page turn!
This wait is characteristic of on-the-fly pagination. Displaying the last page of a chapter requires that all the previous pages in that chapter be paginated first. I’ve mentioned in previous installments of this column that the pagination technique I’ve been using is grossly inefficient, and this particular example proves it.
My slow pagination technique uses the text-wrapping feature of TextBlock to render entire paragraphs, or partial paragraphs if the paragraph straddles multiple pages. If a paragraph is too large to fit on a page, then my code starts lopping off words at the end of the paragraph until it fits. After each word is removed, Silverlight must re-measure the TextBlock, and this requires lots of time.
Certainly I need to revise my pagination logic. A better pagination algorithm breaks each paragraph into individual words, obtains the size of each word, and performs its own word-positioning and line-wrapping logic.
In the previous e-book readers I’ve shown in this column, each paragraph (or partial paragraph) on a page is just one TextBlock, and these TextBlock elements are children of a StackPanel. In the e-book reader I’ll describe in this column, every word on the page is its own TextBlock, and each TextBlock is positioned at a specific location on a Canvas. These multiple TextBlock elements require a little more time for Silverlight to render the page, but the page layout is speeded up enormously. My experiments show that the troublesome 10-second page transition in MiddlemarchReader is reduced to two seconds when each word is measured with a TextBlock element, and to 0.5 seconds when character widths are cached in an array like traditional font metrics.
But it’s time for a new book. The downloadable Visual Studio solution for this article is called PhineasReader and it lets you read the story of one of Anthony Trollope’s most beloved fictional characters, the Irish Member of Parliament, “Phineas Finn” (1869). Once again, I’ve used a plain-text file downloaded from Project Gutenberg (gutenberg.org).

The FontMetrics Class

When a computer font is first designed, the font designer chooses a number that’s called the “em-size.” The term comes from olden days when the capital letter M was a square block of type, and the size of that M determined the heights and relative widths of all the other characters.
Many TrueType fonts are designed with an em-size of 2,048 “design units.” That size is large enough so that the character height is an integer—usually greater than 2,048 to accommodate diacritic marks—and all the widths of all the characters are integers as well.
If you create a TextBlock using any of the fonts supported on Windows Phone 7, and set the FontSize property to 2,048, you’ll discover that ActualWidth returns an integer regardless what character you set to the Text property. (ActualHeight is also an integer except for the Segoe WP Bold font and the default Portable User Interface font. These two names refer to the same font, and the height is 2,457.6. I don’t know the reason for this inconsistency.)
Once you obtain the character height and widths based on a FontSize property set to 2,048, you can simply scale that height and the widths for any other font size.
Figure 1 shows the FontMetrics class I created. If you need to deal with multiple fonts, you’d maintain a separate FontMetrics instance for each font family, font style (regular or italic) and font weight (regular or bold). It’s quite likely these FontMetrics instances would be referenced from a dictionary, so I created a Font class that implements the IEquatable interface, hence it’s suitable as a dictionary key. My e-book reader only needs one FontMetrics instance based on the default Windows Phone 7 font.
Figure 1 The FontMetrics Class
  1. public class FontMetrics
  2. {
  3.   const int EmSize = 2048;
  4.   TextBlock txtblk;
  5.   double height;
  6.   double[][] charWidths = new double[256][];
  7.  
  8.   public FontMetrics(Font font)
  9.   {
  10.     this.Font = font;
  11.             
  12.     // Create the TextBlock for all measurements
  13.     txtblk = new TextBlock
  14.     {
  15.       FontFamily = this.Font.FontFamily,
  16.       FontStyle = this.Font.FontStyle,
  17.       FontWeight = this.Font.FontWeight,
  18.       FontSize = EmSize
  19.     };
  20.  
  21.     // Store the character height
  22.     txtblk.Text = " ";
  23.     height = txtblk.ActualHeight / EmSize;
  24.   }
  25.  
  26.   public Font Font { protected set; get; }
  27.  
  28.   public double this[char ch]
  29.   {
  30.     get
  31.     {
  32.       // Break apart the character code
  33.       int upper = (ushort)ch >> 8;
  34.       int lower = (ushort)ch & 0xFF;
  35.  
  36.       // If there's no array, create one
  37.       if (charWidths[upper] == null)
  38.       {
  39.         charWidths[upper] = new double[256];
  40.  
  41.         for (int i = 0; i < 256; i++)
  42.           charWidths[upper][i] = -1;
  43.       }
  44.  
  45.       // If there's no character width, obtain it
  46.       if (charWidths[upper][lower] == -1)
  47.       {
  48.         txtblk.Text = ch.ToString();
  49.         charWidths[upper][lower] = txtblk.ActualWidth / EmSize;
  50.       }
  51.       return charWidths[upper][lower];
  52.     }
  53.   }
  54.  
  55.   public Size MeasureText(string text)
  56.   {
  57.     double accumWidth = 0;
  58.  
  59.     foreach (char ch in text)
  60.       accumWidth += this[ch];
  61.  
  62.     return new Size(accumWidth, height);
  63.   }
  64.  
  65.   public Size MeasureText(string text, int startIndex, int length)
  66.   {
  67.     double accumWidth = 0;
  68.  
  69.     for (int index = startIndex; index < startIndex + length; index++)
  70.       accumWidth += this[text[index]];
  71.  
  72.     return new Size(accumWidth, height);
  73.   }
  74. }
Originally I thought I would take advantage of my knowledge about the common em-size of 2,048 and store all character widths as integers, perhaps 16-bit integers. However, I decided to play it safe and store them as double-precision floating-point values instead. I then decided that FontMetrics would divide the ActualWidth and ActualHeight values by 2,048, so it really stores values appropriate for a FontSize of 1. This makes it easy for any program using the class to multiply the values by the desired FontSize.
The Project Gutenberg plain-text files only contain characters with Unicode values less than 256. Therefore, the FontMetrics class could store all the character widths it needs in a simple array of 256 values. Because this class might be used for text with character codes greater than 255, I wanted something more flexible than that, but I knew that the last thing I wanted was to allocate an array sufficient to store 64,536 double-precision floating point values. That’s .5MB of memory just for the font metrics!
Instead, I used a jagged array. The array named charWidths has 256 elements, each of which is an array of 256 double values. A 16-bit character code is divided into two 8-bit indices. The upper byte indexes the charWidths array to obtain an array of 256 double values, and then the lower byte of the character code indexes that array. But these arrays of double values are only created as they’re needed, and individual character widths are obtained only as they’re needed. This logic takes place in the indexer of the FontMetrics class, and both reduces the amount of storage required by the class and cuts down unnecessary processing for characters that are never used.
The two MeasureText methods obtain the size of a string, or a substring of a larger string. These two methods return values appropriate for a FontSize of 1, which can then be scaled simply by multiplying by the desired font size.
TextBlock elements are usually aligned on pixel boundaries because the UseLayoutRounding property defined by the UIElement class has a default value of true. For text, pixel alignment helps readability because it avoids inconsistent anti-aliasing. After multiplying the values obtained from MeasureText by the font size, you’ll want to pass those values through the Math.Ceiling method. This will give you values rounded up to the next integral pixel.

Fancier Formatting

As in my previous e-book readers, most of the real grunt work of the program occurs in the PageProvider class. This class has two main jobs: pre-processing the Project Gutenberg file to concatenate individual lines of the file into single-line paragraphs, and pagination.
To test FontMetrics for character codes greater than 255, I decided to perform a little bit more pre-processing than in the past. First, I replaced standard double quotes (ASCII code 0x22) with “fancy quotes” (Unicode 0x201C and 0x201D) by simply alternating the two codes within each paragraph. Also, Victorian authors tend to use a lot of em-dashes—often to delimit phrases like this one—and these turn up in the Project Gutenberg files as pairs of dashes. In most cases, I replaced these pairs of dashes with Unicode 0x2014 surrounded by spaces to facilitate line breaks.
My new pre-processing logic also handles consecutive lines with the same indenting. Often these indented lines comprise a letter or other indented material in the text, and I tried to handle those in a more graceful way. While paginating, I began all non-indented paragraphs with a first-line indent, except for the first paragraph of a chapter, which I presume is usually a chapter title.
The overall effect of this indentation logic is illustrated in Figure 2.
A Page from PhineasReader Showing Paragraph Indenting
Figure 2 A Page from PhineasReader Showing Paragraph Indenting

Pagination and Composition

Because PageProvider has taken over much of the layout previously performed by TextBlock itself, the pagination logic has become a little too lengthy for the pages of this magazine. But it’s fairly straightforward. All the paragraphs that comprise the Project Gutenberg text are stored as a List of ParagraphInfo objects. The formatted book is a BookInfo object that’s mostly a List of ChapterInfo objects. The ChapterInfo object indicates the index of the paragraph that begins the chapter and also maintains a List of PageInfo objects that are created as the book is progressively paginated.
The PageInfo class is shown in Figure 3. It indicates where the page begins with a paragraph index and a character index within that paragraph, and also maintains a List of WordInfo objects. The WordInfo class is shown in Figure 4. Each WordInfo object corresponds to a single word, so this class indicates the word’s coordinate location on the page and the text of the word as a substring of a paragraph.
Figure 3 The PageInfo Class Represents Each Paginated Page
  1. public class PageInfo
  2. {
  3.   public PageInfo()
  4.   {
  5.     this.Words = new List<WordInfo>();
  6.   }
  7.  
  8.   public int ParagraphIndex { set; get; }
  9.  
  10.   public int CharacterIndex { set; get; }
  11.  
  12.   public bool IsLastPageInChapter { set; get; }
  13.  
  14.   public bool IsPaginated { set; get; }
  15.  
  16.   public int AccumulatedCharacterCount { set; get; }
  17.  
  18.   [XmlIgnore]
  19.   public List<WordInfo> Words { set; get; }
  20. }
Figure 4 The WordInfo Class Represents a Single Word
  1. public class WordInfo
  2. {
  3.   public int LocationLeft { set; get; }
  4.  
  5.   public int LocationTop { set; get; }
  6.  
  7.   public int ParagraphIndex { set; get; }
  8.  
  9.   public int CharacterIndex { set; get; }
  10.  
  11.   public int CharacterCount { set; get; }
  12. }
You’ll notice in the PageInfo class that the Words property is flagged with XmlIgnore, meaning that this property won’t be serialized with the rest of the class, and hence isn’t saved in isolated storage along with the rest of the pagination information. A few little calculations will convince you of the wisdom of this decision: “Phineas Finn” is more than 200,000 words in length, and WordInfo contains 20 bytes of data, so, in memory, all the WordInfo objects will occupy more than 4MB. That’s not too bad, but consider these 200,000 WordInfo objects converted to XML for serialization! Besides, if the beginning of a page is known, calculating the locations of the words on that page using the FontMetrics class is very fast, so these WordInfo objects can be recreated without performance problems.
Figure 5 shows the BuildPageElement method in PageProvider that basically converts a PageInfo object into a Canvas containing a bunch of TextBlock elements. It’s this Canvas that’s actually rendered on the screen.
Figure 5 The BuildPageElement Method in PageProvider
  1. FrameworkElement BuildPageElement(ChapterInfo chapter, PageInfo pageInfo)
  2. {
  3.   if (pageInfo.Words.Count == 0)
  4.   {
  5.     Paginate(chapter, pageInfo);
  6.   }
  7.  
  8.   Canvas canvas = new Canvas();
  9.  
  10.   foreach (WordInfo word in pageInfo.Words)
  11.   {
  12.     TextBlock txtblk = new TextBlock
  13.     {
  14.       FontFamily = fontMetrics.Font.FontFamily,
  15.       FontSize = this.fontSize,
  16.       Text = paragraphs[word.ParagraphIndex].Text.
  17.         Substring(word.CharacterIndex, 
  18.         word.CharacterCount),
  19.         Tag = word
  20.     };
  21.  
  22.     Canvas.SetLeft(txtblk, word.LocationLeft);
  23.     Canvas.SetTop(txtblk, word.LocationTop);
  24.     canvas.Children.Add(txtblk);
  25.   }
  26.   return canvas;
  27. }
The actual pagination and layout code doesn’t touch the UI. Only the BuildPageElement method that composes the page creates UI objects. The separation of pagination from page composition is new in this version of the e-book reader, and it means that the pagination and layout could occur in a background thread. I’m not doing that in this program, but it’s something to keep in mind.

Not Just for Performance

I originally decided to abandon TextBlock for layout for performance reasons. But there are at least two more compelling reasons for using separate TextBlock elements for each word.
First, if you ever wanted to justify your paragraphs, this is an essential first step. Silverlight for Windows Phone 7 doesn’t support the TextAlignment.Justify enumeration member. But if every word is a separate TextBlock, justification is simply a matter of distributing extra space between the individual words.
The second reason involves the problem of selecting text. You might want to allow the user to select text for different purposes: perhaps to add notes or annotations to a document, or to look up words or phrases in a dictionary or Bing or Wikipedia, or to simply copy text to the clipboard. You’ll need to provide the user with some way to select the text and to display this selected text in a different color.
Can a single TextBlock display different pieces of text in different colors? Yes, that’s possible with the Inlines property and a separate Run object for the selected text. It’s messy, but it’s possible.
The more difficult problem is letting the user select the text to begin with. The user should be able to click or touch a particular word and then drag to select multiple words. But if an entire paragraph is displayed by a single TextBlock element, how do you know what word that is? You can perform hit-testing on the TextBlock itself, but not on the individual Run objects.
When each word is its own TextBlock, the hit-testing job becomes much easier. Of course, other challenges arise on the phone. Chunky fingers must select tiny text, which means it’s probably necessary for the user to enlarge the text before beginning selection.
As usual, as each new feature in a program is introduced, it suggests even more features.

Charles Petzold is a longtime contributing editor to MSDN Magazine. His recent book, “Programming Windows Phone 7” (Microsoft Press, 2010), is available as a free download at bit.ly/cpebookpdf

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...