| Abstract |
|
The thousands of specialized structured file formats in use today present a substantial barrier to freely exchanging information between applications programs. We consider the problem of deducing such basic features as the whitespace characters, bracketing delimiter symbols, and self-delimiter characters of a given file format from one or more example files. We demonstrate that for sufficiently large example files, we can typically identify the basic features of interest.
|
Additional Information
|
Citation:
Levon Lloyd, Steven Skiena,
"Parsing Without a Grammar: Making Sense of Unknown File Formats,"
icdm,
p. 195,
Third IEEE International Conference on Data Mining (ICDM'03),
2003
|