Variety Generation involves the selection of sets of character strings, or symbols, which are intended to occur with equal probabilities in bodies of text or sets of text units from a particular source. It is important that the sample used to generate the symbol set should be representative of the data with which the set will be used. An assessment is given here of the amount of variation in symbol sets generated from files of titles and author names from BNB MARC data over a five year period, and a comparison is made with LC MARC. Some of the BNB symbol sets are compared directly, and equifrequency statistics are obtained for the assignment of each symbol set to each file. The differences between the equifrequency statistics are examined by means of an analysis of variance technique.
Discussion(0)
No comments yet. Be the first to comment.