- Home >
- Computer Science Department >
- Data Science
- > 1st Annual Workshop on Data Sciences 2015 - Speakers
Speakers
Data Sciences Workshop
You can download some of the presentations. See below.
听
Keynote speakers听
, Associate Professor of Biomedical Engineering
Johns Hopkins University, USA
Title:听Algebraic, Sparse and Low Rank Subspace Clustering (
Slides:
pdf
)
Video: Watch Here
Abstract:听In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. Traditional approaches often assume that the data is sampled from a single low-dimensional manifold. However, in many applications in signal/image processing, machine learning and computer vision, data in multiple classes lie in multiple low-dimensional subspaces of a high-dimensional ambient space. In this talk, I will present methods from algebraic geometry, sparse representation theory and rank minimization for clustering and classification of data in multiple low-dimensional subspaces. I will show how these methods can be extended to handle noise, outliers as well as missing data. I will also present applications of these methods to video segmentation and face clustering.
Biography:听Professor Vidal received B.S. degree in Electrical Engineering (highest honors) from the Pontificia Universidad Catolica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively, and has been on the faculty of the Center for Imaging Science in the Department of Biomedical Engineering of The Johns Hopkins University since 2004, where he is currently an Associate Professor. He was co-editor of the book ''Dynamical Vision" and has co-authored more than 180 articles in biomedical image analysis, computer vision, machine learning, hybrid systems, robotics and signal processing. He has received many awards for his work including the 2012 J.K. Aggarwal Prize, the 2009 ONR Young Investigator Award, the 2009 Sloan Research Fellowship, the 2005 NFS CAREER Award, and best paper awards at ICCV-3DRR 2013, PSIVT 2013, CDC 2012, MICCAI 2012, CDC 2011 and ECCV 2004. Dr. Vidal has been Associate Editor of Medical Image Analysis, the IEEE Transactions on Pattern Analysis and Machine Intelligence, the SIAM Journal on Imaging Sciences and the Journal of Mathematical Imaging and Vision, Program Chair for ICCV 2015, CVPR 2014, WMVC 2009 and PSIVT 2007, and Area Chair for MICCAI 2013 and 2014, ICCV 2007, 2011 and 2013, and CVPR 2005 and 2013, and program committee member for all major conferences in computer vision. machine learning and medical imaging. He is a fellow of the IEEE and a member of the ACM and SIAM.
, Professor of Mathematics
Vanderbilt University, USA
Title:听Subspace Segmentation and Its Applications (
Slides:
pdf
)
Video: Watch here
Abstract:听The subspace segmentation problem is fundamental in many applications. The goal is to cluster data drawn from an unknown union of subspaces. We will state the problem and describe its connection to other areas of mathematics and engineering. We then review the mathematical and algorithmic methods created to solve this problem and some of its particular cases. We also describe the problem of motion tracking in videos and its connection to the subspace segmentation problem and compare the various techniques for solving it.
Biography:听Akram Aldroubi is a Professor of Mathematics at Vanderbilt University, and he is a Fellow of the American Mathematical Society. He has authored and co-authored over 100 research publications related to modern harmonic analysis and its applications. He is the Co-editor in Chief of the international Journal on Sampling Theory in Signal and Image Processing (STSIP), and he is on the editorial board of several other mathematical journals.
听
Invited speakers
Dr. Guangcan Liu, Professor听
School of Information and Control
Nanjing University of Information Science and Technology, China
Title:听Robust Subspace Clustering in High Dimension: A Deterministic Result (
Slides:
pptx
)
Video: Watch here
Abstract:听It is of great interest to explore the problem of Robust Subspace Clustering: Given a collection of data points approximately drawn from a union of multiple subspaces, the goal is to segment the points into their respective subspaces and remove possible errors as well. In general, without any presumptions about the data, it is virtually hard to resolve this problem for sure. Fortunately, today's data is often high-dimensional and massive, and thus very often the sum of those multiple subspace together has a rank of fairly low, i.e., the union of multiple subspaces could be regarded as a single low-dimensional subspace. This fact drives us to propose a simple yet effectual method for subspace clustering. Similar to prevalent clustering methods, our method also adopts a two-stage framework: It firstly learns an affinity matrix from the given data points and then uses spectral clustering techniques to produce the final clustering results. The inference process of the affinity matrix is formulated as a nuclear norm minimization problem, termed Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent each data point as a linear combination of the other points. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: Under certain conditions, it is proved that LRR can exactly recover the authentic row projector from a given set of data points possibly contaminated by outliers. Since the subspace membership of the data points is provably determined by the authentic row projector, this further implies that LRR can well solve the robust subspace clustering problem under certain conditions.
Biography:听Dr. Guangcan Liu received the bachelor's degree in mathematics and the Ph.D. degree in computer science and engineering from Shanghai Jiao Tong University, Shanghai, China, in 2004 and 2010, respectively. He was a Post-Doctoral Researcher with the National University of Singapore, Singapore, from 2011 to 2012, the University of Illinois at Urbana-Champaign, Champaign, IL, USA, from 2012 to 2013, Cornell University, Ithaca, NY, USA, from 2013 to 2014, and Rutgers University, Piscataway, NJ, USA, in 2014. Since 2014, he has been a Professor with the School of Information and Control, Nanjing University of Information Science and Technology, Nanjing, China. His research interests mainly include machine learning, computer vision, and image processing.
Dr. Anna Little, Assistant Professor
Department of Mathematics
Jacksonville University, USA
Title: Estimating the Intrinsic Dimension of High-Dimensional Data Sets (
Slides: pdf
)
Video: Watch here
Abstract:听This talk discusses a novel approach for estimating the intrinsic dimension of noisy, high-dimensional point clouds. A general class of sets which are locally well-approximated by k dimensional planes but which are embedded in a D>>k dimensional Euclidean space are considered. The dimension is estimated via a new multiscale algorithm that generalizes principal component analysis (PCA). The classical PCA approach recovers the dimension when the data is linear but fails when the data is non-linear, overestimating the intrinsic dimension. This new multiscale algorithm exploits the low-dimensional structure of the data, so that its power depends on k rather than D, and is robust to small sample size, noise, and non-linearities in the data.
Biography:听Anna Little has served as an assistant professor of Mathematics at Jacksonville University in Jacksonville, FL since fall 2012. She got her undergraduate degree at Samford University in 2006 and a PhD in mathematics from Duke University in 2011, where she worked under Dr. Mauro Maggioni to develop a multiscale algorithm for intrinsic dimension estimation of high-dimensional data sets. In addition to high-dimensional data analysis, her research interests include multiscale methods, clustering algorithms, statistics, and machine learning.
, Assistant Professor
School of Engineering and Applied Sciences
Harvard University
Title:听Randomized Kaczmarz Algorithm and Its Cousins: Exact Performance Analysis and Large System Dynamics (
Slides: pdf
)
Video: Watch here
Abstract:听Randomized Kaczmarz algorithm (RKA) is a simple but efficient method for solving large-scale over-determined systems through random iterative projections. Although the algorithm has been used for some time, it was only recently that Strohmer and Vershynin established its exponential convergence in the mean square sense. A flurry of work followed on performance bounds and the optimization of the algorithm. In this talk, I will present an exact analysis of the algorithm for both noisy and noiseless cases. In particular, I will show how to compute the exact mean square error (MSE) in the value reconstructed by RKA using a simple 'lifting trick': the empirical MSE is the trace of the empirical error covariance, whose evolution can be described by a random linear dynamical system in a higher dimensional lifted space. For the noiseless case, I will show how to compute the error exponent, i.e., the exponential decay rate of the MSE, and describe how to optimize the row-selection probabilities to speed up convergence. The typical convergence of the algorithm is much faster than the decay rate of the MSE suggests; I will define a "quenched" error exponent to characterize the typical convergence and apply statistical physics-based bounds to approximate it. Our analysis agrees with numerical results, which also indicate that previous upper bounds in the literature for both the noisy and noiseless cases can often be several orders of magnitude too high. Finally, I will show how to extend our analysis to other related randomized algorithms, both in finite dimensions as well as in the infinite dimensional large system limit.
Joint work with Ameya Agaskar (Harvard & MIT Lincoln Laboratory) and Chuang Wang (Harvard).
Biography:听Yue M. Lu attended the University of Illinois at Urbana-Champaign, where he received the M.Sc. degree in Mathematics and the Ph.D. degree in Electrical Engineering, both in 2007. He was a Research Assistant at the University of Illinois at Urbana-Champaign, and a postdoctoral researcher at the Audiovisual Communications Laboratory at Ecole Polytechnique F茅d茅rale de Lausanne (EPFL), Switzerland. Since September 2010, he has been an Assistant Professor of Electrical Engineering at Harvard University, directing the Signals, Information, and Networks Group (SING) at the School of Engineering and Applied Sciences.听
He received the Most Innovative Paper Award of IEEE International Conference on Image Processing (ICIP) in 2006, the Best Student Paper Award of IEEE ICIP in 2007, and the Best Student Presentation Award at the 31st SIAM SEAS Conference in 2007. Student papers supervised and coauthored by him won the Best Student Paper Award of IEEE International Conference on Acoustics, Speech and Signal Processing in 2011 and the Best Student Paper Award of IEEE Global Conference on Signals and Information Processing (GlobalSIP) in 2014.
He has been an Associate Editor of the IEEE Transactions on Image Processing since December 2014, and an Elected Member of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee since January 2015.
, Professor
Center for Computational Science and Department of Mathematical Sciences
Middle 91制片厂
Vanderbilt University
Title:听High Dimensional Data Analysis with听Applications in IMS and fMRI Processing (
Slides: pdf
)
Video: Watch here
Abstract:听Many high dimensional data sets such as imaging mass spectrometry听(IMS) and functional magnetic resonance imaging (fMRI) data are of the听hyper-spectral imaging (HSI) type. Advanced mathematical tools and听statistical techniques can not only provide significance analysis of听experimental data sets but also can help in finding new data听features/patterns, guiding biological experiments designs, as well as听leading computational tools development. In this talk, we would like听to discuss challenges in HSI type data processing and report some听recent progress using statistical computing methods for hype-spectral听imaging type medical data processing, especially on IMS cancer data听analysis and fMRI applications in AD and autism study.
This is a joint听work with Qiang Wu,听Lu Xiong, Jingsai Liang, and Xin Yang.
Biography:听Don Hong earned his Ph.D. in Mathematics from Texas A&M University in听1993 and has held a postdoctoral position at the University of听Texas-Austin and served on the faculty at East Tennessee State听University. He has been a professor of the Center for Computational听Sciences and the Department of Mathematical Sciences of Middle听91制片厂 (MTSU) since 2005. He was co-editor of the听book ''Quantitative Medical Data Analysis Using Mathematical Tools and
Statistical Techniques" and has co-authored the book "Real Analysis听with Introduction to Wavelets" and over 50 articles in computational听sciences. Dr. Hong is on the editorial board of several computational听science journals including the Journal of Health and Medical听Informatics, International Journal of Computational Mathematics,听Journal of Applied Functional Analysis, International Journal of听Mathematics and Computer Science, and American Research Journal of听Mathematics. He also serves as the coordinator of actuarial science听program at MTSU.
, Assistant Professor
Department of Electrical Engineering and Computer Science
University of Michigan, Ann Arbor
Title:听Subspace Clustering with Missing Data (
Slides: pdf
)
Video: Watch here
Abstract:听Many big data problems require algorithms that can handle missing data. For a subspace or union of subspaces model, we are fortunately able to leverage results on incomplete data projections to estimate the model in the presence of missing data. This talk will discuss two algorithms based on these ideas. We will also discuss theoretical results on when it is possible to identify a union of subspaces given incomplete data.
Biography:听Laura Balzano is an assistant professor in Electrical Engineering and Computer Science at the University of Michigan. Laura received her BS, MS, and Ph.D. in Electrical Engineering from Rice University, the University of California in Los Angeles, and the University of Wisconsin, respectively. She received the Outstanding MS Degree of the year award from the UCLA EE Department, and the Best Dissertation award from the University of Wisconsin ECE Department. She has worked as a software engineer at Applied Signal Technology, Inc on signal processing software for massive data. Her PhD was supported by a 3M fellowship. Her main research focus is on statistical signal processing, estimation, optimization, and modeling with highly incomplete or corrupted data, and its applications in computer vision, network monitoring, and environmental sensing.
, Associate Professor
Department of Biochemistry & Cellular and Molecular Biology
University of Tennessee
UT/ORNL Center for Molecular Biophysics
Institute of Biomedical Engineering
Title:听Supercomputer-Based Drug Discovery: Finding the Needle in the Data Haystack
Video: Watch here
Abstract:听Virtual screening is a computational biology technique that has long been used, for instance in the pharmaceutical industry, to discover molecules that can bind to protein targets. Recent technological and fundamental developments based on the availability of petaflop supercomputers can revolutionize this approach and tackle complex system problems that are the hallmark of biology .
However, the amount - and the complexity - of the data that must be analyzed and understood is very challenging. I will present real case applications that aim at clustering the data in chemical and biological spaces that represent the function and properties of the biomolecules of interest. We will discuss how to translate this avalanche of data into biological knowledge through PCA analysis and complex systems approaches.
Biography:听Professor Jerome Baudry joined the Center for Molecular Biophysics in 2008 as an Assistant Professor at the University of Tennessee, Knoxville; Department of Biochemistry & Cell and Molecular Biology. Dr Baudry obtained his Ph.D. in Molecular Biophysics with the highest Honors from the University of Paris-06, France (University Pierre and Marie Curie ). He subsequently joined the group of Klaus Schulten at the University of Illinois at Urbana-Champaign as a post-doc. After his post-doctoral work, Dr. Baudry worked in the pharmaceutical industry as a Research Scientist, and then accepted a Senior Research Scientist position back in Illinois. Prior to his appointment in Tennessee, Dr. Baudry was Research Assistant Professor in the School of Chemical Sciences at the University of Illinois, Urbana-Champaign. The Baudry laboratory develops and applies methods and protocols in computational molecular biophysics for structure-based molecular discovery. The lab works on several targets relevant to human and animal health as well as on targets of agrochemical interest. The theoretical approach is complemented by close collaborations with experimental groups.
Dr. Tim Wallace, Ph.D.
Computer and Information Systems Engineering
91制片厂
Title: Application of Subspace Clustering in DNA Sequence Analysis
Video: Watch here
Abstract:听Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups lie within a union of subspaces for unique clusters of the orthologous groups. In this talk, we will discuss the recent experimental findings and compare to the main hypothesis and predictions, as well as simulations from a perfect binary random mutation tree.
This work includes contributions from Dr. A. Sekmen and Dr. X. Wang.
Biography:听Dr. Wallace recently completed his doctorate at 91制片厂 in 2014 under advisement of Dr. Ali Sekmen with committee members from the Departments of Computer Science and Electrical and Computer Engineering. His current activities include research and development of statistical algorithms in information theoretic areas such as predictive analytics and high-dimensional data mining. In addition, his research interests include methods and techniques for diverse computational challenges in mathematics, biology, biomedical imaging, and molecular radiation physics.
Daniel Pimentel, Ph.D. Candidate
Advisor:
Electrical and Computer Engineering
University of Wisconsin-Madison
Title: On the Difficulties of Subspace Clustering with Missing Data (
Slides: pdf
)
Video: Watch here
Abstract:听We love subspaces. We observe a phenomenon and try to find a line that explains it. We get our hands on some data, and try to find a subspace that fits it. But sometimes one subspace is not enough. Data are often better explained by multiple lines, or more generally, unions of subspaces. Hence the importance of subspace clustering: infer the set of subspaces that best fit a dataset.
In many relevant applications missing data are common, thus subspace clustering with missing data (SCMD) is a task we would very much like to perform. Nevertheless, the sample complexity of SCMD remains an important open problem. In this talk I will discuss the difficulties of this task and introduce the problem of subspace identifiability from canonical projections, which sheds new light into the SCMD problem.
Michael Northington, Ph.D. Candidate
Advisor:
Department of Mathematics
Vanderbilt University
Title:听Balian Low Type Uncertainty Principles for Shift Invariant Spaces with Extra Invariance (
Slides: pdf
)
Video: Watch here
Abstract:听Shift invariant subspaces of L^2(R^d), such as certain spline spaces, wavelet spaces, and Paley-Wiener spaces, are commonly used in applications. Recently, there has been interest in studying finitely generated shift invariant spaces which are endowed with extra invariance by some non-integer translation. We will introduce the theory of shift invariant spaces and explain how this extra invariance assumption causes obstructions to the localization of the generators of the space.
Biography: Michael Northington is听a graduate assistant at Vanderbilt University where he is co-advised by Alexander Powell and Doug Hardin. He received a BS from Austin Peay State University and an MS from the University of Mississippi. His current areas of research are applied harmonic analysis, inverse problems, and machine learning.
听