Technical Notes |
Version | 1 |
Author | Doug Ewell |
Date | 2004-01-30 |
This Version | http://www.unicode.org/notes/tn14/tn14-1.html |
Previous Version | none |
Latest Version | http://www.unicode.org/notes/tn14/ |
This document is a discussion of the various ways in which Unicode text can be compressed for storage and interchange. Several different approaches are examined and evaluated, including the Unicode "compression formats," SCSU and BOCU-1; general-purpose compression algorithms such as RLE, Huffman, and LZW; the use of multiple techniques to improve compression; and the effects of normalization on compression. A detailed description of a professional-grade SCSU encoding algorithm is included.
This document is a Unicode Technical Note. It is supplied purely for informational purposes and publication does not imply any endorsement by the Unicode Consortium. For general information on Unicode Technical Notes, see http://www.unicode.org/notes/.
The body of this note is contained in the file
"UnicodeCompression.pdf"
(415,177 bytes).
© 2004 Doug Ewell. This publication is protected by copyright, and permission must be obtained from the author and Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use.
Use of this publication is governed by the Unicode Terms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.