• Automatically Diagnosing and Repairing Error Handling Bugs in C, Y. Tian, B.Ray, 10 pages. In 11th joint meeting of the European Software Engineering conference and the ACM Sigsoft Symposium on the Foundations of Software (ESEC-FSE’17), acceptance rate: 24.4%. Best Paper Award.

  • GitcProc: A Tool for Processing and Classifying GitHub Commits, C. Casalnuovo, Y. Suchak, B. Ray, C. Rubio-Gonzalez, 4 pages, In International Symposium on Software Testing and Analysis (ISSTA’17).

  • Some From Here, Some From There: Cross-Project Code Reuse in GitHub, M. Gharehyazie, B. Ray, V. Filkov, 10 pages. The 14th International Conference on Mining Software Repositories (MSR’17), acceptance rate: 27%. Best Paper Award.

  • A Large Scale Study of Programming Languages and Code Quality in Github. B. Ray, D. Posnett, P. T. Devanbu, V. Filkov. CACM Research Highlights.
  • APEx: Automated Inference of Error Specifications for C APIs, 10 pages, acceptance rate: 19.1% (to appear)
    by Yuan Jochen Kang, Baishakhi Ray, Suman Jana.
    [ASE 2016]

  • Automatically Detecting Error Handling Bugs using Error Specifications, 18 pages, acceptance rate: 15.5% (to appear)
    by Suman Jana, Yuan Jochen Kang, Samuel Roth, Baishakhi Ray.
    [USENIX Security 2016]

  • On the “Naturalness” of Buggy Code, 12 pages, acceptance rate: 19%
    by Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
    [ICSE 2016]      

       title={On the" Naturalness" of Buggy Code},
      author={Ray, Baishakhi and Hellendoorn, Vincent and Tu, Zhaopeng and Nguyen, Connie and Godhane, Saheel and Bacchelli, Alberto and Devanbu, Premkumar},
       series = {ICSE '16},

    Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be “natural”, like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is “unnatural” in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca. 7,139), from 10 different Java projects, and focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed. Ordering files for inspection by their average entropy yields cost-effectiveness scores comparable to popular defect prediction methods. At a finer granularity, focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and ordering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.
  • Assert Use in GitHub Projects, 11 pages, acceptance rate: 18.5%
    by Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, Baishakhi Ray.
    ICSE 2015    

       title={Assert Use in GitHub Projects},
      author={Casey, Casalnuovo and Prem, Devanbu and Abilio, Oliveira and Vladimir, Filkov and Ray, Baishakhi},
       series = {ICSE '15},

    Assertions in a program are believed to help with automated verification, code
    understandability, maintainability, fault localization, and diagnosis, all eventually leading
    to better software quality. Using a large dataset of assertions in C and C++ programs, we
    confirmed this claim, i.e., methods with assertions do have significantly fewer defects. Assertions
    also appear to play a positive role in collaborative software development, where many
    programmers are working on the same method. We further characterized assertion usage along
    process and product metrics. Such detailed characterization of assertions will help to predict
    relevant locations of useful assertions and will improve code quality.

    A revised version of the paper is available here.

  • The Uniqueness of Changes: Characteristics and Applications, 11 pages, acceptance rate: 30%
    by Baishakhi Ray, Meiyappan Nagappan, Christian Bird, Nachiappan Nagappan, Thomas Zimmermann.
    [MSR 2015]    

       title={The Uniqueness of Changes: Characteristics and Applications},
      author={Ray, Baishakhi and Nagappan, Meiyappan and Bird, Christian and Nagappan, Nachiappan and Zimmermann, Thomas},
       series = {MSR '15},

    Changes in software development come in many forms. Some changes are frequent, idiomatic, or
    repetitive (e.g. adding checks for nulls or logging important values) while others are unique.
    We hypothesize that unique changes are different from the more common similar (or non-unique)
    changes in important ways; they may require more expertise or represent code that is more complex
    or prone to mistakes. As such, these unique changes are worthy of study. In this paper, we present a
    definition of unique changes and provide a method for identifying them in software project history.
    Based on the results of applying our technique on the Linux kernel and two large projects at
    Microsoft, we present an empirical study of unique changes. We explore how prevalent unique changes
    are and investigate where they occur along the architecture of the project. We further investigate
    developers’ contribution towards uniqueness of changes. We also describe potential applications of
    leveraging the uniqueness of change and implement two of those applications, evaluating the risk of
    changes based on uniqueness and providing change recommendations for non-unique changes.

  • Gender and Tenure Diversity in GitHub Teams, 10 pages, acceptance rate: 20%.
    by Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark van den Brand, Alexander Serebrenik, Premkumar Devanbu, Vladimir Filkov.
    [CHI 2015]    

       title={Gender and Tenure Diversity in GitHub Teams},
       author={Bogdan, Vasilescu and Posnett, Daryl and Ray, Baishakhi and Brand, Mark van den and Filkov and Serebrenik, Alexander and Premkumar, Devanbu and Filkov, Vladimir},
       series = {CHI '15},

    Using GitHub, we studied gender and tenure diversity in online
    programming teams. Using the results of a survey and regression modeling of
    GitHub data set comprising over 2 Million projects, we studied how diversity
    relates to team productivity and turnover. We showed that both gender and
    tenure diversity are positive and significant predictors of productivity. These
    results can inform decision-making on all levels, leading to better outcomes in
    recruiting and performance.

  • A Large Scale Study of Programming Languages and Code Quality in Github, 10 pages, acceptance rate: 20%
    by Baishakhi Ray, Daryl Posnett, Vladimir Filkov, Premkumar T. Devanbu.
    [FSE 2014]      
    Media Coverage: SlashDot, The Register, Reddit, InfoWorld, Hacker News

       title={A Large Scale Study of Programming Languages and Code Quality in Github},
       author={Ray, Baishakhi and Posnett, Daryl and Filkov, Vladimir and Devanbu, Premkumar},
       booktitle={Proceedings of the ACM SIGSOFT 22nd International Symposium on the
                 Foundations of Software Engineering},
       series = {FSE '14},

    To investigate whether a programming language is the right tool for the job, I gathered a
    very large data set from GitHub (728 projects, 63M lines of code, 29K authors, 1.5M commits,
    in 17 languages). Using a mixed-methods approach, combining multiple regression modeling with
    visualization and text analytics, I studied the effect of language features such as static v.s.
    dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from
    different methods, and controlling for confounding effects such as code size, project age, and
    contributors, I observed that a language design choice does have a significant, but modest
    effect on software quality.

  • Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations  S&P 2014 Best Practical Paper Award, 16 pages, acceptance rate: 13%
    by Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, Vitaly Shmatikov.
    [S&P (Oakland) 2014]            
    Media Coverage: Reddit, Golem, Heise

       title={Using Frankencerts for Automated Adversarial Testing of Certificate Validation
             in SSL/TLS Implementations},
       author={Brubaker, Chad and Jana, Suman and Ray, Baishakhi and Khurshid, Sarfraz and
              Shmatikov, Vitaly},
       booktitle={IEEE Symposium on Security and Privacy 2014},

    Nowadays in open software market, multiple software are available to users that provide
    similar functionality. For example, there exists a pool of popular SSL/TLS libraries (e.g.,
    OpenSSL, GnuTLS, NSS, CyaSSL, GnuTLS, PolarSSL, MatrixSSL, etc.) for securing network
    connections from man-in-the-middle attacks. Certificate validation is a crucial part of
    SSL/TLS connection setup. Though implemented differently, the certificate validation logic of
    these different libraries should serve the same purpose, following the SSL/TLS protocol, i.e.
    for a given certificate, all of the libraries should either accept or reject it. In
    collaboration with security researchers at the University of Texas at Austin, we designed the
    first large-scale framework for testing certificate validation logic in SSL/TLS
    implementations. First, we generated millions of synthetic certificates by randomly mutating
    parts of real certificates and thus induced unusual combinations of extensions and
    constraints. A valid SSL implementation should be able to detect and reject the unusual
    mutants. Next, using a differential testing framework, we checked whether one SSL/TLS
    implementation accepts a certificate while another rejects the same certificate. We used such
    discrepancies as an oracle for finding flaws in individual implementations. We uncovered 208
    discrepancies between popular SSL/TLS implementations, many of them are caused by serious
    security vulnerabilities.

  • Detecting and Characterizing Semantic Inconsistencies in Ported Code.  Nominated for distinguished paper award, Invited for journal special issue, 10 pages, acceptance rate: 23%
    by Baishakhi Ray, Miryung Kim, Suzette Person, Neha Rungta
    [ASE 2013]      

       title={Detecting and characterizing semantic inconsistencies in ported code},
       author={Ray, Baishakhi and Kim, Miryung and Person, Suzette and Rungta, Neha},
       booktitle={Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on},

    In order to automatically detect copy-paste errors, I investigated: (1) What are the common
    types of copy-paste errors? (2) How can they be automatically detected? By analyzing
    the version histories of FreeBSD and Linux, I found five common types of copy-paste errors and
    then leveraging this categorization I designed a two-stage analysis technique to detect and
    characterize copy-paste errors. The first stage of the analysis, SPA, detects and categorizes
    inconsistencies in repetitive changes based on a static control and data dependence analysis.
    SPA successfully identifies copy-paste errors with 65% to 73% precision, an improvement by 14
    to 17 percentage points with respect to previous tools. The second stage of the analysis,
    SPA++, uses the inconsistencies computed by SPA to direct symbolic execution in order to
    generate program behaviors that are impacted by the inconsistencies. SPA++ further compares
    these program behaviors leveraging logical equivalence checking (implemented with z3 theorem
    prover) and generates test inputs that exercise program paths containing the reported
    inconsistencies. A case study shows that SPA++ can refine the results reported by SPA and help
    developers analyze copy-paste inconsistencies. I collaborated with researchers from NASA for
    this work.

  • An Empirical Study of API Stability and Adoption in the Android Ecosystem . 10 pages, acceptance rate: 22%
    by Tyler McDonnell, Baishakhi Ray, Miryung Kim
    [ICSM 2013]      

       title={An empirical study of API stability and adoption in the Android ecosystem},
       author={McDonnell, Tyler and Ray, Baishakhi and Kim, Miryung},
       booktitle={Software Maintenance (ICSM), 2013 29th IEEE International Conference on},

    In today’s software ecosystem, which is primarily governed by web, cloud, and mobile
    technologies, APIs perform a key role to connect disparate software. Big players like
    Google, FaceBook, Microsoft aggressively publish new APIs to accommodate new feature
    requests, bugs fixes, and performance improvements. We investigated how such fast paced
    API evolution affects the overall software ecosystem? Our study on Android API evolution
    showed that the developers are hesitant to adopt fast evolving, unstable APIs. For
    instance, while Android updates 115 APIs per month on average, clients adopt the new APIs
    rather slowly, with a median lagging period of 16 months. Furthermore, client code with
    new APIs is typically more defect prone than the ones without API adaptation. To the best
    of my knowledge, this is the first work studying API adoption in a large software
    ecosystem, and the study suggests how to promote API adoption and how to facilitate growth
    of the overall ecosystems.

  • A Case Study of Cross-System Porting in Forked Software Projects. 11 pages, acceptance rate: 17%
    by Baishakhi Ray, Miryung Kim
    [FSE 2012]      

       title = {A Case Study of Cross-system Porting in Forked Projects},
       author = {Ray, Baishakhi and Kim, Miryung},
       booktitle = {Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering},
       series = {FSE 2012},
       articleno = {53},
       pages = {53:1--53:11}

    This paper empirically demonstrates that developers spend a significant amount of time and
    effort in introducing similar features and bug-fixes in and across different projects.
    This involves a significant amount of repeated work. To automatically identify the
    repetitive changes, I designed Repertoire, an source code change analysis tool that
    compares the edit contents and the corresponding operations of program patches to identify
    similar changes, with 94% precision and 84% recall. Using Repertoire, I showed that
    developers often introduce a significant amount of repeated changes within and across
    projects. Most notably, repetitive changes among forked projects (different variants of an
    existing project, e.g., FreeBSD, NetBSD and OpenBSD) incur significant duplicate work. In
    each BSD release, on average, more than twelve thousand lines are ported from peer
    projects, and more than 25% of active developers participate in cross-system porting in
    each release.

  • Repertoire: A Cross-System Porting Analysis Tool for Forked Software Projects . 4 pages
    by Baishakhi Ray, Christopher Wiley, Miryung Kim
    [FSE 2012]      

       title={Repertoire: A cross-system porting analysis tool for forked software projects},
       author={Ray, Baishakhi and Wiley, Christopher and Kim, Miryung},
       booktitle={Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering},
       series = {FSE 2012},
       articleno = {8},
       pages = {8:1--8:4},

    To create a new variant of an existing project, developers often copy an existing
    codebase and modify it. This process is called software forking. After forking software,
    developers often port new features or bug fixes from peer projects. Repertoire analyzes
    repeated work of cross-system porting among forked projects. It takes the version
    histories as input and identifies ported edits by comparing the content of individual
    patches. It also shows users the extent of ported edits, where and when the ported edits
    occurred, which developers ported code from peer projects, and how long it takes for
    patches to be ported.

  • An Empirical Study of Supplementary Bug Fixes. 10 pages, acceptance rate: 28%
    by Jihun Park, Miryung Kim, Baishakhi Ray, Doo-Hwan Bae
    [MSR 2012]    

       title={An empirical study of supplementary bug fixes},
       author={Park, Jihun and Kim, Miryung and Ray, Baishakhi and Bae, Doo Hwan},
       booktitle={Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on},

    A recent study finds that errors of omission are harder for programmers to detect than
    errors of commission. While several change recommendation systems already exist to
    prevent or reduce omission errors during software development, there have been very few
    studies on why errors of omission occur in practice and how such errors could be
    prevented. In order to understand the characteristics of omission errors, this paper
    investigates a group of bugs that were fixed more than once in open source
    projects—those bugs whose initial patches were later considered incomplete and to which
    programmers applied supplementary patches.

  • PTask: Operating System Abstractions To Manage GPUs as Compute Devices. 16 pages, acceptance rate: 17%
    by C. J. Rossbach, J. Currey, M. Silberstein, Baishakhi Ray, E. Witchel
    [SOSP 2011]    

       title={PTask: Operating system abstractions to manage GPUs as compute devices},
       author={Rossbach, Christopher J and Currey, Jon and Silberstein, Mark and Ray, Baishakhi and Witchel, Emmett},
       shorthand = {SOSP'11},
       booktitle={Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles},

    GPUs are typically used for high-performance rendering or batch-oriented computations, but
    not as general purpose compute-intensive tasks, such as brain-computer interfaces or file
    system encryption. Current OS treats GPU as an I/O device as opposed to a general purpose
    computational resource, like a CPU. To overcome this issue, we proposed PTask APIs, a new
    set of OS abstractions. As part of this work, I ported EncFS, a FUSE based encrypted file
    system for Linux, to CUDA framework such that it can use GPU for AES encryption and
    decryption. Using PTask’s GPU scheduling mechanism, I showed that running EncFS on GPU over
    CPU made a sequential read and write of a 200MB file 17% and 28% faster.


Book Chapter