Tao Cheng

Ph.D. expected in Summer 2010
Department of Computer Science
University of Illinois at Urbana-Champaign
201 N. Goodwin Avenue
Urbana, IL 61801, USA
E-mail: tcheng3[at]cs.uiuc.edu
Complete CV: HTML PDF

Hi, there. I am a Ph.D. candidate in the Department of Computer Science, University of Illinois at Urbana-Champaign. My advisor is Dr. Kevin Chen-Chuan Chang. Below is a few highlights of my research:

  • We are one of the first to explore the idea of entity search. Our work [6] is the top cited paper on "entity search"*.
  • My intern work at MSR on entity synonym generation is deployed in "Bing". See a live example here.
  • Our work on clustering XML documents [7] is one of the top cited papers on "clustering XML"*.


Research Interests   

My research interests lie in large-scale data management, touching upon diverse areas: Web search, information retrieval, data mining, databases and natural language processing. I enjoy building novel information systems, as well as identifying and solving real world research problems that emerge in the process.


Recent Experiences  
  • Research Assistant, Database and Information Systems Lab, University of Illinois at Urbana-Champaign, 2004-present
    Disertation Research: Entity Search in the WISDM project: http://wisdm.cs.uiuc.edu
    We aim at proposing and building a novel entity-aware search engine beyond document retrieval, which searches over fine granularity data entities inside pages on the Web. Our work is one of the first to explore entity search, which is now an emerging trend in DB and IR research communities. This novel search paradigm requires tight coupling of information retrieval, information extraction and database techniques. Specifically, we worked on the following key aspects of entity-aware search:
    • search problem formulation [6] in CIDR 2007 and system prototype in SIGMOD 2007
    • formal ranking model (EntityRank) [5] in VLDB 2007
    • indexing design and parallelization [2] in EDBT 2010
    • extensibility and general query language [4] in WSDM 2010
    • structured object search [1] under submission

  • Intern, Search Labs at Microsoft Research, 05/2008-08/2008
    I worked on generating entity synonyms for canonical entity names to support structured Web search. Our technique can significantly increase the coverage of structured Web queries, and therefore improve user search experience. The work resulted in:
    • design and implementation of an automatic entity synonym generation system that is deployed into "Bing" search
    • a US patent describing the invention
    • a research paper published [3] in ICDE 2010 and a journal paper submitted to TKDE.

  • Intern, Cazoodle Inc., 05/2007-12/2007
    Cazoodle Inc. is a Web search startup, founded by my advisor Dr. Kevin Chen-Chuan Chang.
    I worked as the key architect of several search products towards data-aware search, specifically:
    • co-designed and co-implemented a general distributed crawling, extraction, indexing framework
    • implemented a large-scale entity search engine prototype over 150 million pages
    • adapted entity search techniques into GeoEngine for Army's Geospatial Intelligence Gathering application, resulted in phrase I&II SBIR funding

Selected Publications [Complete List]   
  1. T. Cheng, M. Gupta and K. C.--C. Chang, "Supporting Context-aware Structured Object Search: An Entity Association Based Approach," under submission.

  2. T.Cheng, K. C.--C. Chang, "Beyond Pages: Supporting Efficient, Scalable Entity Search with Dual-Inversion Index," to appear in the Proceeding of the 13th International Conference on Extending Database Technology (EDBT 2010), Lausanne, Switzerland, Mar 2010. [PDF][PPT]

  3. T. Cheng, H. Lauw, S. Paparizos, "Fuzzy Matching of Web Queries to Structured Data," in the Proceeding of the 26th International Conference on Data Engineering Conference (ICDE 2010), Short Paper, Long Beach, USA, Mar 2010. [PDF][PPT]

  4. M. Zhou, T.Cheng, K. C.--C. Chang, "Data-oriented Content Query System: Searching for Data in Text on the Web," in the Proceeding of the Third International Conference on Web Search and Data Mining (WSDM 2010), New York, USA, Feb 2010. [PDF][PPT]

  5. T.Cheng, X. Yan and K. C.--C. Chang, "EntityRank: Searching Entities Directly and Holistically," in the Proceeding of the 33rd International Conference on Very Large Data Bases (VLDB 2007), Vienna, Austria, 2007. [PDF][PPT]

  6. T.Cheng and K. C.--C. Chang, "Entity Search Engine: Towards Large Scale Information Integration on the Web," in the Proceeding of the 3rd Conference of Innovative Database Systems Research (CIDR 2007), Extended Demo Paper, Asilomar, Jan 2007. [PDF][PPT]

  7. T. Dalamagas, T. Cheng, K. J. Winkel and T. Sellis, "A Methodology for Clustering XML Documents by Structure," in Information Systems, vol. 33, no. 3, pages 187-228, 2006. [PDF]

Patents  
  • K. C.--C. Chang, T. Cheng and X. Yan, "SYSTEM FOR ENTITY SEARCH AND A METHOD FOR ENTITY SCORING IN A LINKED DOCUMENT DATABASE," US Patent 20090083262 by University of Illinois at Urbana-Champaign, 2009.

  • S.Paparizos, T. Cheng and H. Lauw, "GENERATING SYNONYMS BASED ON QUERY LOG DATA," US Patent by Microsoft, 2008.

Selected Awards  
  • ICDE PhD Travel Fellowships, 2010.
  • Yahoo Key Technical Challenge Award, 2007 (1 out of 12 selected nation wide)
  • Conference of Innovative Database System Research (CIDR) Scholarship, 2007.
  • Excellent TA Award, Department of Computer Science, UCSB, 2004.
  • Honor Degree, Mixed Class from the Chu Kochen Honors College, ZJU, 2003
  • Distinctive Graduate Honor, ZJU, 2003.
  • Outstanding Graduation Thesis, ZJU, 2003.

Hobbies   

In my spare time, I like to travel with friends, play basketball, watch football and take photos.

* According to Google scholar as of Jan 10, 2010.

Last Modified: Jan, 2010