Program

Schedule

Time
Contents
9:10- 9:40Registration
9:40- 9:50Opening Talk
9:50-10:30Michele Lanza (35 mins for talks and 5 mins for QA)
10:30-11:10 Jacky Keung (35 mins for talks and 5 mins for QA)
11:10-11:15Announcement (Poster, lunch and dinner)
11:15-13:30Lunch&Poster
13:30-14:10Romain Robbes (35 mins for talks and 5 mins for QA)
14:10-14:40break
14:40-15:20Ashish Sureka (35 mins for talks and 5 mins for QA)
15:20-16:00Meiyappan Nagappan (35 mins for talks and 5 mins for QA)
16:00-16:10Closing Talk

Presentations

Speaker: Jacky Keung (City University of Hong Kong)

Title: Data-intensive Ensemble Methods for Mining Software Engineering Repositories

Abstract: Developing effective software effort estimation (SEE) models based on software engineering data is an important research area, as correctly estimating the effort required to develop software is of vital importance for any software project manager. Over or underestimation of software development will lead to undesirable project outcomes. There are considerable amount of research carried out in this area and many of these newer models such as analogy-based estimation method provide an useful early estimate in the software development process as a viable technique for software project management. However, prediction performances generated from these models also depend heavily on the historical software project dataset characteristics as well as the selected model design combinations, as well as the context of the project being assessed. Resulting in unstable or conflicting performance results being generated.

Until relative recently, our attempts and experimentations on the applications of multiple methods (Ensembles) combining two or more single SEE methods have been successfully showing significant stability and performance improvements in the models produced, which is also in consistent with the findings from the machine learning research community. The important research direction provided in the study was that while there is no best single SEE method, there exist best combination of such SEE methods using Ensembles, because it has demonstrated the capacity to improve upon the accuracy of a single SEE model. While our previous study on ensembles has shined lights to the future development of SEE, in this seminar we will discuss what are the potentials and opportunities to apply Ensemble techniques to other mining software repository applications, and more importantly technical challenges within and areas to be further explored.

The broader impacts of utilizing Ensemble methods will be to making mining software repository models more transferable to practice, to allow exploiting different model capabilities in various circumstances in practice.

Speaker: Meiyappan Nagappan (Rochester Institute of Technology)

Title: Big(ger) Data in Software Engineering

Abstract: My research is centered around analyzing Software Engineering (SE) datasets that are several orders of magnitude bigger than the typical SE datasets. Examples of my datasets include: all the mobile apps in the Google Play store, all of the world’s Open Source projects, and hundreds of gigabytes of execution logs. Such large
datasets, provide us with a unique view into the SE field.

However, these large datasets also bring some tough challenges given their 4 V’s (volume, variety, velocity, and veracity). Such challenges often complicate the analysis of the data and can invalidate the interpretation of the results. In this talk, I present an overview of
key results from several of my recent studies, and how my collaborators and I overcame these challenges. In particular, I will touch on some of the key research questions that I have tackled:

– How can we pick a diverse sample of projects from all the available
projects in the world for a case study?
– How come the mobile app markets are growing so fast while still
ensuring a high quality user experience?

I will conclude my talk with some of the exciting research
opportunities present in analyzing such large datasets.

Speaker: Romain Robbes (University of Chile)

Title: How scratching my own itch led to the best work of my Ph.D., and other stories

Abstract: As I write this, it’s been ten years since I started doing research in Mining Software Repositories. In this talk, I relate several episodes of my research career, and reflect on the lessons I learnt from them and how they apply to MSR in general. The lessons range from the value of solving one’s own problems, the challenges associated with using alternative software repositories, to the potential that lies in applying MSR techniques to neighboring research areas.

Speaker: Ashish Sureka (Indraprastha Institute of Information Technology, Delhi)

Title: Kashvi: A Framework for Software Process Intelligence

Abstract:  Software Process Intelligence (SPI) is an emerging and evolving discipline
involving mining and analysis of software processes. This is modeled on the lines of Business Process Intelligence (BPI), but with the focus on software processes and its applicability to software systems. Process mining consists of mining event log and process trace data for the purpose of process discovery (run-time process model), process verification or compliance checking (comparison between design-time and run-time process model), process enhancement and recommendation. Software Process Mining or
Intelligence is a new and emerging discipline which falls at the intersection of Software Process & Mining, and Software & Process Mining. Software Process Mining is integral to discovering and verifying the processes in a software system.
Software Process Mining is a three word phrase which can be viewed from two perspectives: Software + Process Mining and Software Process + Mining. Software development and evolution involves usage of several work flow management and information systems and tools such as Issue Tracking Systems (ITS), Version Control Systems (VCS), Peer Code Review Systems (PCR) and Continuous Integration Tools (CIT). Such information systems log data consisting of events, activities, time-stamp, user or actor and context specific data. Such events or trace data generated by information systems used during software construction (as part of the software development
process) contains valuable information which can be mined for gaining useful insights and actionable information. In this talk, I will present Kashvi: A Framework (and applications) for Software Process Intelligence. I will present the framework, architecture and real-world applications of Software Process Intelligence.