|
Published Articles >> Table of Contents >> Abstract
21st International Conference on Data Engineering (ICDE'05)
pp. 57-68
Corpus-Based Schema Matching
Jayant Madhavan, University of Washington
Philip A. Bernstein, Microsoft Research
AnHai Doan, University of Illinois at Urbana-Champaign
Alon Halevy, University of Washington
Full Article Text:
 
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.39
Send link to a friend
| Abstract |
|
Schema Matching is the problem of identifying corresponding
elements in different schemas. Discovering these
correspondences or matches is inherently difficult to automate.
Past solutions have proposed a principled combination
of multiple algorithms. However, these solutions
sometimes perform rather poorly due to the lack of
sufficient evidence in the schemas being matched. In this paper
we show how a corpus of schemas and mappings can
be used to augment the evidence about the schemas being
matched, so they can be matched better. Such a corpus typically
contains multiple schemas that model similar concepts
and hence enables us to learn variations in the elements
and their properties. We exploit such a corpus in two
ways. First, we increase the evidence about each element
being matched by including evidence from similar elements
in the corpus. Second, we learn statistics about elements
and their relationships and use them to infer constraints
that we use to prune candidate mappings. We also describe
how to use known mappings to learn the importance of domain
and generic constraints. We present experimental results
that demonstrate corpus-based matching outperforms
direct matching (without the benefit of a corpus) in multiple
domains.
|
Additional Information
|
Citation:
Jayant Madhavan, Philip A. Bernstein, AnHai Doan, Alon Halevy,
"Corpus-Based Schema Matching,"
icde,
pp. 57-68,
21st International Conference on Data Engineering (ICDE'05),
2005
|
|