GREEDY MACRO TEXT COMPRESSION

dc.contributor.authorWitten, Ian H.
dc.contributor.authorBell, Timothy
dc.date.accessioned2008-02-27T22:27:12Z
dc.date.available2008-02-27T22:27:12Z
dc.date.computerscience1999-05-27
dc.date.issued1987-12-01
dc.description.abstractText compression schemes can be divided into two classes: statistical, where symbols are assigned codes based on their probabilities, and macro, where groups of consecutive symbols (phrases) are replaced by indexes into some dictionary. A subset of macro coding called Greedy Macro (GM) accounts for the vast majority of macro schemes in the literature, including all variations of the popular Ziv-Lempel method. At each coding step, GM schemes encode as many symbols as possible with a single index to the dictionary. Although this parsing strategy is not optimal, no optimal macro scheme can be implemented with a bounded coding delay. This paper defines GM coding and establishes an algorithm which takes any such scheme and constructs a statistical coding method that achieves exactly the same compression rate. Thus GM schemes can never achieve better compression than statistical ones. The conclusion is that research aimed at increasing compression should concentrate on statistical methods, leaving macro schemes for applications in which compression efficiency can be sacrificed for speed and modest memory requirements.eng
dc.description.notesWe are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at digitize@ucalgary.ca
dc.identifier.department1987-285-33
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/31160
dc.identifier.urihttp://hdl.handle.net/1880/46158
dc.language.isoEng
dc.publisher.corporateUniversity of Calgary
dc.publisher.facultyScience
dc.subjectComputer Scienceeng
dc.titleGREEDY MACRO TEXT COMPRESSIONeng
dc.typeunknown
thesis.degree.disciplineComputer Scienceeng

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1987-285-33.pdf
Size:
2.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.86 KB
Format:
Plain Text
Description: