Google uses schema matching to find the most relevant results for us. For example, until recently, when google scraped a job-listing page online, it was able to identify the “Job title”, “Job location”, “Job Description”, “Salary” etc., and present the same to its users online. While Google is a tech giant, a lot of matching happens manually. This manual matching could be time-consuming, biased, inconsistent, or even inaccurate at times.
Such capability of reducing need for manual category matching for useful bits of information has been discussed in the research paper by Roee Shraga and Avigdor Gal titled “PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching – Technical Report” that forms the basis of the following text.
Importance of this research
The researchers have demonstrated that human matching could be biased and inaccurate. Studies show humans could match two elements despite their low confidence, possibly leading to poor performance. At a broader level, improved matching algorithms could help us find faster and accurate matches between jobs and job-seekers, colleges and students, between people, etc.
Proposed Solution
Researchers have offered a novel angle on the behavior of humans as matchmakers, analyzing matching as a process. They have analyzed human behavior for matching and have proposed PoWare Match, which uses a deep learning mechanism to calibrate and filter human matching decisions.
Experiment Setting
The research team executes an experiment with more than 200 humans as matchers, PoWareMatch and other common benchmarks and compares the results.
Results
Researchers have empirically established that PoWareMatch generates high-quality matches, even outperforming state-of-the-art matching algorithms.
Conclusion
In the words of the researchers,
This work offers a novel approach to address matching, analyzing it as a process and improving its quality using machine learning techniques. We recognize that human matching is basically a sequential process and define a matching sequential process using matching history and monotonic evaluation of the matching process. We show conditions under which precision, recall and f-measure are monotonic. Then, aiming to improve on the matching quality, we tie the monotonicity of these measures to the ability of a correspondence to improve on a match evaluation and characterize such correspondences in probabilistic terms. Realizing that human matching is biased we offer PoWareMatch to calibrate human matching decisions and compensate for correspondences that were left out by human matchers using algorithmic matching. Our empirical evaluation shows a clear benefit in treating matching as a process, confirming that PoWareMatch improves on both human and algorithmic matching. We also provide a proof-of-concept, showing that PoWareMatch generalizes well to the closely domain of ontology alignment. An important insight of this work relates to the way training data should be obtained in future matching research. The observations of this paper can serve as a guideline for collecting (query user confidence, timing the decisions, etc.), managing (using a decision history instead of similarity matrix), and using (calibrating decisions using PoWareMatch or a derivative) data from human matchers. In future work, we aim to extend PoWareMatch to additional platforms, e.g., crowdsourcing, where several additional aspects, such as crowd workers heterogeneity, should be considered. Interesting research directions involve experimenting with additional matching tools and analyzing the merits of LSTM in terms of overfitting and sufficient training data.
Source: Roee Shraga and Avigdor Gal, “PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching – Technical Report”