Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
We developed a ranking model to recommend source code files to fix for a bug report.
The ranking model is a weighted combination of multiple features that every feature is a unique type of information measuring the relevance between a source file and the bug report.
The weight parameters of our ranking model is trained automatically using a learning-to-rank technique.
Through evaluations on six large-scale open source Java projects, our model achieved significantly better results than the state-of-the-art approaches.
Languages used in this project: Java, Python, SQL
Tools/Libraries used in this project: Git, Bugzilla, MySQL, JDT ASTParser, Apache Tika, Beautiful Soup, NLTK, JUNG, SVM^rank
Algorithms used in this project: VSM, SVM^rank, PageRank, HITS
I developed a search engine for searching websites within the ohio.edu domain and other websites related to Ohio University (OU).
I used Apache Nutch to crawl 117,620 web pages based on 10 OU websites as the seeds.
I used Apache Solr to index these 117,620 pages.
I built a search engine as a servlet running under Apache Tomcat. The servlet uses Apache Lucene libraries to search web pages within the index created by Solr.
This search engine supports VSM, BM25, and the Smoothed Unigram Language Model. It also uses anchor text for searching.
Languages used in this project: Java, HTML
Tools/Libraries used in this project: Apache Nutch, Solr, Lucene, Tomcat, Servlet
Algorithms used in this project: VSM, BM25, Smoothed Unigram Language Model
I developed a face detection program to automatically detect human faces from static images.
This program applies Principle Component Analysis (PCA) to create an eigenface subspace from the training images.
A training or a testing image can be represented as a weighted combination of the eigenfaces.
Therefore, an image can be represented by its weight vector in a lower dimension.
If the euclidean distance between the weight vector of a testing image and the weight vector of a human face image is below a threshold, then face detected.
Languages used in this project: C++
Tools/Libraries used in this project: OpenCV (compiled on Solaris)
It responses to ICMP Echo request, UDP Echo, ND request, Router Advertisement, RIPng advertisement, ICMP error message, time exceed, and udp port unreachable.
It supports ARP table, routing table, and routing queues.
I used both the Times 33 and the Times 31 algorithm for double hashing to avoid hash collisions.
I used multi-threads in this project.
Languages used in this project: C++
Libraries used in this project: POSIX Threads
Booting Linux with U-Boot on PowerPC on a FPGA-DSP-based Digital Signal Processing Board
I modified and ran U-Boot on IBM PowerPC that was integrated within the Xilinx Virtex-II Pro FPGA on a digital signal processing board.
For debugging, I used U-Boot to load MontaVista Linux kernel and ramdisk image through TFTP.
For stable release, I used U-Boot to burn itself, the MontaVista Linux kernel, and the JFFS2 filesystem to a NAND flash device on the board.
Languages used in this project: C
Tools used in this project: U-Boot, MontaVista Linux