|
Published Articles >> Table of Contents >> Abstract
November 2003 (Vol. 36, No. 11)
pp. 22-29
Data Mining for Very Busy People
Tim Menzies, West Virginia University
Ying Hu, University of British Columbia
Full Article Text:
  
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MC.2003.1244531
Send link to a friend
| Abstract |
|
Most modern businesses can access mountains of data electronically—the trick is effectively using that data. In practice, this means summarizing large data sets to find the data that really matters. Most data miners are zealous hunters seeking detailed summaries and generating extensive and lengthy descriptions. The authors take a different approach and assume that busy people don't need—or can't use—complex models. Rather, they want only the data they need to achieve the most benefits.Instead of finding extensive descriptions of things, their data mining tool hunts for a minimal difference set between things because they believe a list of essential differences is easier to read and understand than detailed descriptions.
|
References
|
[1] T. Menzies et al., "Condensing Uncertainty via Incremental Treatment Learning," Ann. Software Eng., 2002; http://menzies.us/pdf02itar2.pdf.
[2] G. Gigerenzer and D.G. Goldstein, "Reasoning the Fast and Frugal Way: Models of Bounded Rationality," Psychological Rev., vol. 103, 1996, pp. 650-669.
[3] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999,
[4] J.P.C. Kliijnen, "Sensitivity Analysis and Related Analyses: A Survey of Statistical Techniques," J. Statistical Computation and Simulation, vo1. 57, no. 1-4, 1987, pp. 111-142.
[5] T. Menzies and Y. Hu, Just Enough Learning (of Association Rules): The TAR2 Treatment Learner, tech. report, Dept. Computer Science and Electrical Eng., West Virginia Univ., 2002; http://menzies.us/pdf02tar2.pdf.
[6] T. Menzies and H. Singh, "Many Maybes Mean (Mostly) the Same Thing," Proc. 2nd Int'l Workshop Soft Computing Applied to Software Eng., 2001; http://menzies.us/pdf00maybe.pdf.
[7] T. Menzies and E. Sinsel, "Practical Large Scale What-If Queries: Case Studies with Software Risk Assessment," Proc. IEEE Int'l Conf. Automated Software Eng. 2000, IEEE Computer Soc. Press, Los Alamitos, Calif., 2000; (current Aug. 2000).
[8] R. Madachy, "Heuristic Risk Assessment Using Cost Factors," IEEE Software, May 1997, pp. 51-59.
[9] B.W. Boehm et al., Software Cost Estimation with COCOMO II, Prentice-Hall, Upper Saddle River, N.J., 2000.
[10] T. Menzies et al., "Model-Based Tests of Truisms," Proc. 16th IEEE Int'l Conf. Automated Software Eng. (ASE 2002), IEEE CS Press, 2002, pp. 183-191; .
[11] M. Fagan, “Advances in Software Inspections,” IEEE Trans. Software Eng., vol. 12, no. 7, pp. 744–751, July 1986.
[12] M.S. Feather, S.L. Cornford, and T.W. Larson, "Combining the Best Attributes of Qualitative and Quantitative Risk Management Tool Support," Proc. 15th IEEE Int'l Conf. Automated Software Eng. (ASE 2000), IEEE CS Press, 2000, pp. 309-312.
[13] M.S. Feather and T. Menzies, "Converging on the Optimal Attainment of Requirements," Proc. IEEE Joint Conf. Requirements Eng. (RE 2002), IEEE CS Press, 2002, pp. 263-272; .
Additional References
[1] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[2] L. Breiman et al., Classification and Regression Trees, tech. report, Wadsworth Int'l, 1984.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[4] S.B. Bay and M.J. Pazzani, "Detecting Change in Categorical Data: Mining Contrast Sets, Proc. 5th Int'l Conf. Knowledge Discovery and Data Mining, ACM Press, 1999, pp. 302-306.
[5] C.H. Cai et al., "Mining Association Rules with Weighted Items," Proc. Int'l Database Eng. and Applications Symp. (IDEAS), 1998; .
[6] R.C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,” Machine Learning, vol. 11, pp. 63–91, 1993.
[7] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[8] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[9] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, no. 4598, 1983, pp. 671-680.
|
Additional Information
|
Citation:
Tim Menzies, Ying Hu,
"Data Mining for Very Busy People,"
Computer,
vol. 36,
no. 11,
pp. 22-29,
Nov.,
2003
|
|