[1] SHALLCROSS S. 2 Reasons why your data is lying to you[R/OL]. (2016-09-15)[2018-06-25]. http://hollistibbetts.sys-con.com/node/1975126.
[2] BUCKLEY J. Causes of dirty data and how to combat them[R/OL]. (2018-07-13)[2018-06-25]. https://www.qubole.com/blog/causes-of-dirty-data-and-how-to-combat-them.
[3] GALHARDAS H, FLORESCU D, SHASHA D, et al. An extensible framework for data cleaning[C]//IEEE 16th International Conference on Data Engineering. San Diego, USA, 2000.
[4] REKATSINAS T, CHU X, ILYAS I F, et al. HoloClean:Holistic data repairs with probabilistic inference[J]. Proceedings of the VLDB Endowment, 2017, 10(11):1190-1201.
[5] MAYFIELD C, NEVILLE J, PRABHAKAR S. ERACER:A database approach for statistical inference and data cleaning[C]//2010 ACM SIGMOD International Conference on Management of Data. Indianapolis, USA, 2010:75-86.
[6] YAKOUT M, BERTI-ÉQUILLE L, ELMAGARMID A K. Don't be SCAREd:Use scalable automatic repairing with maximal likelihood and bounded changes[C]//2013 ACM SIGMOD International Conference on Management of Data. New York, USA, 2013:553-564.
[7] WANG J N, KRASKA T, FRANKLIN M J, et al. CrowdER:Crowdsourcing entity resolution[J]. Proceedings of the VLDB Endowment, 2012, 5(11):1483-1494.
[8] CHAI C, LI G L, LI J, et al. Cost-effective crowdsourced entity resolution:A partial-order approach[C]//2016 International Conference on Management of Data. San Francisco, USA, 2016:969-984.
[9] 郭志懋, 周傲英. 数据质量和数据清洗研究综述[J]. 软件学报, 2002, 13(11):2076-2082. GUO Z M, ZHOU A Y. Research on data quality and data cleaning:A survey[J]. Journal of Software, 2002, 13(11):2076-2082. (in Chinese)
[10] 杨辅祥, 刘云超, 段智华. 数据清理综述[J]. 计算机应用研究, 2002, 19(3):3-5. YANG F X, LIU Y C, DUAN Z H. An overview of data cleaning[J]. Application Research of Computers, 2002, 19(3):3-5. (in Chinese)
[11] 王曰芬, 章成志, 张蓓蓓, 等. 数据清洗研究综述[J]. 现代图书情报技术, 2007, 2(12):50-56. WANG Y F, ZHANG C Z, ZHANG B B, et al. A survey of data cleaning[J]. New Technology of Library and Information Service, 2007, 2(12):50-56. (in Chinese)
[12] 赵一凡, 卞良, 丛昕. 数据清洗方法研究综述[J]. 软件导刊, 2017(12):222-224. ZHAO Y F, BIAN L, CONG X. Survey of data cleaning method in data preprocessing[J]. Software Guide, 2017(12):222-224. (in Chinese)
[13] ELMAGARMID A K, IPEIROTIS P G, VERYKIOS V S. Duplicate record detection:A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1):1-16.
[14] CHU X, ILYAS I F, KRISHNAN S, et al. Data cleaning:Overview and emerging challenges[C]//2016 International Conference on Management of Data. San Francisco, USA, 2016:2201-2206.
[15] ILYAS I F, CHU X. Trends in cleaning relational data:Consistency and deduplication[J]. Foundations and Trends in Databases, 2015, 5(4):281-393.
[16] FAN W F, GEERTS F, JIA X B, et al. Conditional functional dependencies for capturing data inconsistencies[J]. ACM Transactions on Database Systems (TODS), 2008, 33(2):6.
[17] BOHANNON P, FAN W F, GEERTS F, et al. Conditional functional dependencies for data cleaning[C]//IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey, 2007:746-755.
[18] ABITEBOUL S, HULL R, VIANU V. Foundations of databases:The logical level[M]. Boston, USA:Addison-Wesley Longman Publishing, 1995.
[19] KOUDAS N, SAHA A, SRIVASTAVA D, et al. Metric functional dependencies[C]//IEEE 25th International Conference on Data Engineering. Shanghai, China, 2009:1275-1278.
[20] SONG S X, CHEN L. Differential dependencies:Reasoning and discovery[J]. ACM Transactions on Database Systems (TODS), 2011, 36(3):16.
[21] WANG J N, TANG N. Towards dependable data repairing with fixing rules[C]//2014 ACM SIGMOD International Conference on Management of Data. Snowbird, USA, 2014:457-468.
[22] WANG J N, TANG N. Dependable data repairing with fixing rules[J]. Journal of Data and Information Quality (JDIQ), 2017, 8(3-4):16.
[23] FAN W F, LI J Z, MA S, et al. Towards certain fixes with editing rules and master data[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2):173-184.
[24] FAN W F, LI J Z, MA S, et al. Towards certain fixes with editing rules and master data[J]. The VLDB Journal, 2012, 21(2):213-238.
[25] INTERLANDI M, TANG N. Proof positive and negative in data cleaning[C]//IEEE 31st International Conference on Data Engineering. Seoul, Republic of Korea, 2015:18-29.
[26] HAO S, TANG N, LI G L, et al. Cleaning relations using knowledge bases[C]//IEEE 33rd International Conference on Data Engineering. San Diego, USA, 2017:933-944.
[27] HAO S, TANG N, LI G L, et al. Distilling relations using knowledge bases[J]. The VLDB Journal, 2018, 27(4):497-519.
[28] HUA M, PEI J. DiMaC:A disguised missing data cleaning tool[C]//14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, USA, 2008:1077-1080.
[29] QAHTAN A A, ELMAGARMID A, CASTRO FERNANDEZ R, et al. FAHES:A robust disguised missing values detector[C]//24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK, 2018:2100-2109.
[30] KÖPCKE H, RAHM E. Frameworks for entity matching:A comparison[J]. Data & Knowledge Engineering, 2010, 69(2):197-210.
[31] HE Q L, LI Z H, ZHANG X. Data deduplication techniques[C]//2010 IEEE International Conference on Future Information Technology and Management Engineering (FITME). Changzhou, China, 2010:430-433.
[32] BAXTER R, CHRISTEN P, CHURCHES T. A comparison of fast blocking methods for record linkage[C]//2003 KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation. Washington, DC, USA, 2003, 3:25-27.
[33] TEJADA S, KNOBLOCK C A, MINTON S. Learning object identification rules for information integration[J]. Information Systems, 2001, 26(8):607-633.
[34] ELFEKY M G, VERYKIOS V S, ELMAGARMID A K. TAILOR:A record linkage toolbox[C]//18th International Conference on Data Engineering. San Jose, USA, 2002.
[35] TEJADA S, KNOBLOCK C A, MINTON S. Learning domain-independent string transformation weights for high accuracy object identification[C]//International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002:350-359.
[36] CHRISTEN P. Febrl:A freely available record linkage system with a graphical user interface[C]//2nd Australasian Workshop on Health Data and Knowledge Management. Wollongong, Australia, 2008:17-25.
[37] MCCALLUM A, NIGAM K, UNGAR L H. Efficient clustering of high-dimensional data sets with application to reference matching[C]//6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA, 2000:169-178.
[38] BAYARDO R J, MA Y M, SRIKANT R. Scaling up all pairs similarity search[C]//16th International Conference on World Wide Web. Banff, Canada, 2007:131-140.
[39] CHAUDHURI S, GANTI V, KAUSHIK R. A primitive operator for similarity joins in data cleaning[C]//22nd International Conference on Data Engineering. Atlanta, USA, 2006.
[40] XIAO C, WANG W, LIN X M, et al. Efficient similarity joins for near-duplicate detection[J]. ACM Transactions on Database Systems (TODS), 2011, 36(3):15.
[41] ARASU A, RÉ C, SUCIU D. Large-scale deduplication with constraints using Dedupalog[C]//2009 IEEE 25th International Conference on Data Engineering. Shanghai, China, 2009:952-963.
[42] HERNÁNDEZ M A, STOLFO S J. The merge/purge problem for large databases[C]//1995 ACM SIGMOD International Conference on Management of Data. San Jose, USA, 1995:127-138.
[43] LEE M L, LING T W, LOW W L. IntelliClean:A knowledge-based intelligent data cleaner[C]//Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA, 2000:290-294.
[44] WANG J N, LI G L, YU J X, et al. Entity matching:How similar is similar[J]. Proceedings of the VLDB Endowment, 2011, 4(10):622-633.
[45] BILENKO M, MOONEY R J. Adaptive duplicate detection using learnable string similarity measures[C]//Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, 2003:39-48.
[46] PAPENBROCK T, EHRLICH J, MARTEN J, et al. Functional dependency discovery:An experimental evaluation of seven algorithms[J]. Proceedings of the VLDB Endowment, 2015, 8(10):1082-1093.
[47] LIU J X, LI J Y, LIU C F, et al. Discover dependencies from data:A review[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2):251-264.
[48] HUHTALA Y, KÄRKKÄINEN J, PORKKA P, et al. TANE:An efficient algorithm for discovering functional and approximate dependencies[J]. The Computer Journal, 1999, 42(2):100-111.
[49] YAO H, HAMILTON H J. Mining functional dependencies from data[J]. Data Mining and Knowledge Discovery, 2008, 16(2):197-219.
[50] NOVELLI N, CICCHETTI R. FUN:An efficient algorithm for mining functional and embedded dependencies[C]//International Conference on Database Theory. London, UK, 2001:189-203.
[51] WYSS C, GIANNELLA C, ROBERTSON E. FastFDs:A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract[C]//International Conference on Data Warehousing and Knowledge Discovery. Munich, Germany, 2001:101-110.
[52] LOPES S, PETIT J M, LAKHAL L. Efficient discovery of functional dependencies and Armstrong relations[C]//International Conference on Extending Database Technology. Konstanz, Germany, 2000:350-364.
[53] FAN W F, GEERTS F, LI J Z, et al. Discovering conditional functional dependencies[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5):683-698.
[54] DE MARCHI F, LOPES S, PETIT J M. Efficient algorithms for mining inclusion dependencies[C]//8th International Conference on Extending Database Technology. Prague, Czech Republic, 2002:464-476.
[55] BOHANNON P, FAN W F, FLASTER M, et al. A cost-based model and effective heuristic for repairing constraints by value modification[C]//2005 ACM SIGMOD International Conference on Management of Data. Baltimore, USA, 2005:143-154.
[56] CHIANG F, MILLER R J. A unified model for data and constraint repair[C]//2011 IEEE 27th International Conference on Data Engineering. Hannover, Germany, 2011:446-457.
[57] CHU X, ILYAS I F, PAPOTTI P. Holistic data cleaning:Putting violations into context[C]//2013 IEEE 29th International Conference on Data Engineering (ICDE). Brisbane, Australia, 2013:458-469.
[58] KOLAHI S, LAKSHMANAN L V S. On approximating optimum repairs for functional dependency violations[C]//12th International Conference on Database Theory. St. Petersburg, Russia, 2009:53-62.
[59] HAO S, TANG N, LI G L, et al. A novel cost-based model for data repairing[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(4):727-742.
[60] ARENAS M, BERTOSSI L, CHOMICKI J. Consistent query answers in inconsistent databases[C]//18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Philadelphia, USA, 1999:68-79.
[61] BRAVO L, BERTOSSI L. Logic programs for consistently querying data integration systems[C]//18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003:10-15.
[62] CALÍ A, LEMBO D, ROSATI R. On the decidability and complexity of query answering over inconsistent and incomplete databases[C]//22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. San Diego, USA, 2003:260-271.
[63] CALI A, LEMBO D, ROSATI R. Query rewriting and answering under constraints in data integration systems[C]//18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003:16-21.
[64] CHOMICKI J, MARCINKOWSKI J. Minimal-change integrity maintenance using tuple deletions[J]. Information and Computation, 2005, 197(1-2):90-121.
[65] GERTZ M, LIPECK U W. A diagnostic approach for repairing constraint violations in databases[M]//JAJODIA S, LIST W, MCGREGOR G, et al, eds. Integrity and internal control in information systems. ⅡCIS 1997. Boston, USA:Springer, 1997, 1:89-111
[66] GRECO G, GRECO S, ZUMPANO E. A logical framework for querying and repairing inconsistent databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(6):1389-1408.
[67] YAKOUT M, ELMAGARMID A K, NEVILLE J, et al. Guided data repair[J]. Proceedings of the VLDB Endowment, 2011, 4(5):279-289.
[68] CHU X, MORCOS J, ILYAS I F, et al. KATARA:A data cleaning system powered by knowledge bases and crowdsourcing[C]//2015 ACM SIGMOD International Conference on Management of Data. Melbourne, Australia, 2015:1247-1261.
[69] COHEN W, RAVIKUMAR P, FIENBERG S. A comparison of string metrics for matching names and records[C]//KDD Workshop on Data Cleaning and Object Consolidation. Washington, DC, USA, 2003:73-78.
[70] JIANG Y, LI G L, FENG J H, et al. String similarity joins:An experimental evaluation[J]. Proceedings of the VLDB Endowment, 2014, 7(8):625-636.
[71] EBAID A, ELMAGARMID A, ILYAS I F, et al. NADEEF:A generalized data cleaning system[J]. Proceedings of the VLDB Endowment, 2013, 6(12):1218-1221.
[72] CONG G, FAN W F, GEERTS F, et al. Improving data quality:Consistency and accuracy[C]//33rd International Conference on Very Large Data Bases. Vienna, Austria, 2007:315-326.
[73] GEERTS F, MECCA G, PAPOTTI P, et al. The LLUNATIC data-cleaning framework[J]. Proceedings of the VLDB Endowment, 2013, 6(9):625-636.
[74] BESKALES G, ILYAS I F, GOLAB L. Sampling the repairs of functional dependency violations under hard constraints[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2):197-207.
[75] KHAYYAT Z, ILYAS I F, JINDAL A, et al. BIGDANSING:A system for big data cleansing[C]//2015 ACM SIGMOD International Conference on Management of Data. Melbourne, Australia, 2015:1215-1230.
[76] BESKALES G, ILYAS I F, GOLAB L, et al. On the relative trust between inconsistent data and inaccurate constraints[C]//2013 IEEE 29th International Conference on Data Engineering. Brisbane, Australia, 2013:541-552.
[77] VOLKOVS M, CHIANG F, SZLICHTA J, et al. Continuous data cleaning[C]//2014 IEEE 30th International Conference on Data Engineering. Chicago, USA, 2014:244-255.
[78] RISSANEN J. Modeling by shortest data description[J]. Automatica, 1978, 14(5):465-471.
[79] COVER T M, THOMAS J A. Elements of information theory[M]. New York, USA:John Wiley & Sons, 1991.
[80] HE J, VELTRI E, SANTORO D, et al. Interactive and deterministic data cleaning[C]//2016 International Conference on Management of Data. San Francisco, USA, 2016:893-907.
[81] AROCENA P C, GLAVIC B, MECCA G, et al. Messing up with BART:Error generation for evaluating data-cleaning algorithms[J]. Proceedings of the VLDB Endowment, 2015, 9(2):36-47.
[82] ANANTHAKRISHNA R, CHAUDHURI S, GANTI V. Eliminating fuzzy duplicates in data warehouses[C]//28th International Conference on Very Large Databases. Hong Kong, China, 2002:586-597.
[83] WANG J N, KRISHNAN S, FRANKLIN M J, et al. A sample-and-clean framework for fast and accurate query processing on dirty data[C]//2014 ACM SIGMOD International Conference on Management of Data. Snowbird, USA, 2014:469-480.
[84] KRISHNAN S, WANG J N, FRANKLIN M J, et al. SampleClean:Fast and reliable analytics on dirty data[C/OL]//Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2015:59-75[2018-06-19]. https://www.ocf.berkeley.edu/~sanjayk/old/pubs/sc.pdf.
[85] LI G L, WANG J N, ZHENG Y D, et al. Crowdsourced data management:A survey[C]//2017 IEEE 33rd International Conference on Data Engineering. San Diego, USA, 2017:39-40.
[86] PARK H, WIDOM J. CrowdFill:Collecting structured data from the crowd[C]//2014 ACM SIGMOD International Conference on Management of Data. Snowbird, USA, 2014:577-588.
[87] WHANG S E, LOFGREN P, GARCIA-MOLINA H. Question selection for crowd entity resolution[J]. Proceedings of the VLDB Endowment, 2013, 6(6):349-360.
[88] TONG Y X, CAO C C, ZHANG C J, et al. CrowdCleaner:Data cleaning for multi-version data on the web via crowdsourcing[C]//2014 IEEE 30th International Conference on Data Engineering. Chicago, USA, 2014:1182-1185.
[89] MICHALSKI R S. A theory and methodology of inductive learning[M]//MICHALSKI R S, CARBONELL J G, MITCHELL T M. Machine learning. Berlin, Germany:Springer, 1983:83-134.
Cited
Shared Discussed相关知识
桥梁健康监测数据分析研究综述.pdf
医疗健康大数据研究综述.pdf
健康数据分析技术的发展及应用
FESE综述
可穿戴设备与健康医疗数据采集技术
中国健康医疗大数据研究综述——基于期刊论文的分析
医疗健康数据分析
如何通过算法和数据库技术实现健康生活的数据分析和预测
健康数据分析与预测
构筑健康医疗大数据共享平台
网址: 结构化数据清洗技术综述 https://m.trfsz.com/newsview340805.html