I’m excited to build products that can positively impact the lives of people. I co-founded Abridge to help everyone stay on top of their health. I started KONAM Foundation to solve challenges in agriculture and education using tech/code.
I'm currently working at the intersection of ML+Health . In the past, I worked on enhancing perception capabilities of UAVs, on multi-robot coordination and was involved with several health-tech projects. As part of my Master’s thesis, I worked on deep-learning-based autonomy for UAVs and developed techniques to make deep-learning algorithms interpretable.
M.S. Robotics, Carnegie Mellon University
Advisors: Stephanie Rosenthal, Manuela M. Veloso
Our CoBot robots have successfully performed a variety of service tasks in our multi-building environment including accompanying people to meetings and delivering objects to offices due to its navigation and localization capabilities. However, they lack the capability to visually search over desks and other confined locations for an object of interest. Conversely, an inexpensive GPS-denied quadcopter platform such as the Parrot ARDrone 2.0 could perform this object search task if it had access to reasonable localization. In this work, we proposed the concept of coordination between CoBot and the Parrot ARDrone 2.0 to perform servicebased object search tasks, in which CoBot localizes and navigates to the general search areas carrying the ARDrone and the ARDrone searches locally for objects. We proposed a vision-based moving target navigation algorithm that enables the ARDrone to localize with respect to CoBot, search for objects, and return to the CoBot for future searches. We demonstrated our algorithm in indoor environments on several search trajectories.
M.S. Robotics, Carnegie Mellon University
Team : Sandeep Konam, Shichao Yang
Advisors: Stephanie Rosenthal, Manuela M. Veloso, Sebastian Scherer
Obstacle avoidance from monocular images is a challenging problem for robots. Though multi-view structure-from-motion could build 3D maps, it is not robust in texture- less environments. Some learning based methods exploit human demonstration for training to predict a steering command directly from a single image. However, this method is usually biased towards certain tasks or scenarios where data is collected and also biased by human understanding. We propose a new method to predict a trajectory from images. We train our system on more diverse NYUv2 dataset. The ground truth trajectory is computed from designed cost functions automatically. Result shows our CNN with intermediate perception increases the accuracy by 20% than directly predicting from raw RGB image. More than 95% of the predicted trajectory doesn’t hit obstacles on NYUv2 dataset. Our model generalizes well to other public indoor datasets. From simulation and experiments, robots can navigate safely in various challenging environments using our approach.
M.S. Robotics, Carnegie Mellon University
Team : Sandeep Konam, Ian Quah
Advisors: Stephanie Rosenthal, Manuela M. Veloso
It is a common complaint that Deep learning methods are near-impossible to understand due to lack of mathematical motivation past the construction stage. To address this, there has been research into investigating how these systems work, and what they learn, but insofar there has been no definite answer. Currently, the state of the art involves visualizing feature maps: Deconvolutional networks (deconvnets), assessing model trust : Local Interpretable Model-agnostic Explanations (LIME) and weakly supervised localization : Class Activation Mapping (CAM). We propose a novel approach that is built on top of deconvnets; wherein we look at what ”significant” neurons in a layer are learning from the image, giving us an insight into how the network is making its decisions in an in-depth manner.
(16-824) Visual Learning and Recognition
Combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications. In this report, we present results of our implementation of Deep Mind’s Deep Q Network, a breakthrough in combining model-free reinforcement learning with deep learning. It is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future-rewards.
(10-701) Machine Learning
Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Speaker recognition or broadly speech recognition has been an active area of research for the past two decades. There has been significant improvement in the recognition accuracy due to the recent resurgence of deep neural networks. In this work we built a LSTM based speaker recognition system on a dataset collected from Cousera lectures. We achieved an accuracy of 93%. Prior to applying deeplearning techniques, we tested on a base-line using feed-forward network on a different dataset and achieved an accuracy of 96.48%. Results show that the deep learning network could detect the speakers very well except in cases where there is significant overlap in the speaker’s accent and tone. Based on the comparison with previous work we conclude that the variation in LSTM produce significant improvement only in terms of the training speed but very little improvement in performance.
(16-720) Computer Vision
The goal of this project is to develop a system capable of generating globally consistent maps. Components include: Stereo visual-inertial perception head as the sensor, Pose estimation based on stereo-visual odometry, Feature appearance based place-matching for loop-closure detection, A pose-graph formulation for error minimization in pose estimates.
(16-811) Math Fundamentals for Robotics
In this project, we investigate the problem of computing the relative transformation between two quadcopters based on corresponding interrobot observations developed during autonomous operation in a common unknown environment. Applications that rely on distributed cooperative mapping need to establish a common reference frame for integration of distributed observations and cooperative control decision. Generally they assume existence of a shared environment representation to establish the common reference frame, for which they require a robust strategy to establish relative pose between individual vehicles. Our work primarily involved testing out an EM based approach in simulation to establish the common reference frame.
(16-741) Mechanics of Manipulation
When humans collaborate to jointly perform a complex manipulation task, what results is often a delicate dance brimming with subtleties. People take cues about their collaborator’s intents based upon his or her motion and perform their own motions in an anticipatory manner so that their collective execution is as seamless and fluid as possible. In robotics we are concerned with emulating these kinds of natural human interactions such that one day humans can collaborate with robots in an equally effortless manner. In the past, robotics literature has focused on how to generate what is called legible motion- motion that is intent-expressive. Recently, there has been work on extending this idea to use motion to not only convey intent about the goal, but also task context. In particular, many tasks rely on having a sense of the weights of the objects being manipulated. Since robots do not necessarily have the same range of weight-lifting capacities as humans and do not show when they are near their limits, human collaborators might have difficulties inferring the weights of objects their collaborative manipulator is handling. This could be especially problematic in the setting of object handovers where the robot is handing an object to a human. Generating weight-expressive motion solves this problem by allowing the robot to move such that its motion conveys how heavy the object being manipulated is. In this project our goal is to explore the qualities that embody effective weight-expressive motion. Our approach is to perform two sets of analyses with a focus on the mechanics of the motions. We use a previously developed formalism for generating weight-expressive motion on a two link planar robot (only two joints are active) holding objects of varying weights, and existing data from user feedback with respect to these motions. We consider the joint parameters values along waypoints of trajectories generated in differing classes of object weights and find the most discriminitive features among them. Additionally we consider the natural language data users give as reasoning for why they label a motion belonging to a ‘heavy object’ or a ‘light object’ manipulation task..
(15-780) Graduate Artificial Intelligence
Deep convolutional neural networks are effective at image classification, but their predictive power often comes at the price of model interpretability. Several techniques have been proposed for generating interpretable predictions via attention maps; however, these methods rarely offer insight as to why the network paid attention to such patches. Here, we propose an explanatory framework to analyze a neural network's prediction based on information flow through the network. Given a trained network and a test image, we rank the neurons in order of importance. We propose two metrics for this task, both measured over a set of images created by small perturbations from the input image: (1) correlation with the output value and (2) variance of the activation. By comparing the important neurons and patches to those that the network should be paying attention to, our framework offers a means toward understanding and debugging convolutional neural networks.
Supervisor : Dr. Eli Peli, Senior Scientist, Schepens Eye Research Institute, Massachusetts Eye & Ear
Real-time face detection in a mobile environment is an attractive concept in the field of augmented reality. More-importantly it has the potential to be an essential low vision application. Implementing this concept entails performing robust face detection with limited computational power. Though face detection has made many strides in the recent past with techniques ranging from appearance based methods to feature invariant methods, it still stands unsatisfactory in real-time. In the present thesis, I'm working on a face detection algorithm that works in real-time and with limited computational power. Prior to this, as to filter the already existing techniques, a detailed, comprehensive and exhaustive study of different key algorithms is undertaken. Performance analysis of these key features can expose the pros and cons and could give motivation for further improvement. In addition to the face detection, image enhancement that is pivotal for low vision patients to discriminate fine details is included. The end goal is to chart out a robust face detection algorithm that could be applicable in real-time and reliable enough to be employed in low vision applications.
Google Summer of Code 2015
Organization: University of Nebraska - Helikar Lab
Mentors: Tomas Helikar, Jiri Adamec
Project aims to develop android application for a screening test for the detection of cancer bio-marker(s) from a small drop of blood. Many softwares exist for Blood sample image analysis, but most of them contain a trade-off between portability and accuracy. Using the state of art Image processing techniques and camera-equipped mobile devices with sufficient processing capacity, developing a reliable application that performs screening test for cancer biomarker detection is feasible.
Research Internship 2014
Supervisor : Prof. Jayanti SivaSwamy
Centre for Visual Information Technology, IIIT-H
Automatic retinal image analysis is emerging as an important screening tool for early detection of eye diseases. Glaucoma is one of the most common causes of blindness. Manual examination of optic disk (OD) is a standard procedure used for detecting glaucoma. An automatic OD parameterization technique based on segmented OD and cup regions obtained from monocular retinal images, was used for the purpose of evaluation. OD segmentation method which integrates the local image information around each point of interest in multi-dimensional feature space to provide robustness against variations found in and around the OD region has been used to extract parameters of interest. Cup segmentation method based on anatomical evidence such as vessel bends at the cup boundary, considered relevant by glaucoma experts has been utilized, alongside OD segmentation method, as to obtain CDR and CAR values. Qualitative and quantitative results of assessment of the adopted segmentation methods along with the discrepancies among CDR values have been analysed.
BioAsia Healthcare Devthon
Ultrasound images are 2D gray scale images due to which clinicians find it difficult to identify and diagnose anomalies. We have developed a software module that converts gray scale ultrasound images into color graphics thereby adding information intelligently to the ultrasound image. The software is optimized for ultrasound images. It provides a more dynamic view of ultrasound images and allows the clinician to highlight regions in the image, filter the image by color, by region and magnify a region or highlight portions of the image. Features are designed to allow clinicians and medical experts to identify and diagnose anomalies that are not normally visible in Gray-scale ultrasound images.
Mango cultivation methods being adopted currently are ineffective and low productive despite consuming huge man power. Agricultural Aid for Mango cutting (AAM) is an Agribot that could be employed for precision mango farming. Unmanned aerial vehicles, efficient machine vision methodologies and high end sensors facilitate the advent of Agribots. It is a quadcopter empowered with vision and cutter systems complemented with necessary ancillaries. It could hover around the trees, detect the ripe mangoes, cut and collect them. AAM robot is the first of its kind that once implemented could pave way to the next generation Agribots capable of increasing the agricultural productivity and justify the existence of intelligent machines.
Guide: Dr. rer.nat. Narni Nageswara Rao, Dept. of Mathematics, RGUKT, R.K.Valley
Automation of object counting in digital images has received significant attention in the last 20 years. Objects under consideration varied from cells, bacteria, trees, fruits, pollen, insects to people. These applications cast light on the importance of shape identification and object counting. A novel algorithm and methodology for detection of mathematically well-defined shapes has been developed. Probability of shapes crossing equally spaced lines is also calculated. Simulations for detection and counting of regular mathematical shapes such as lines and circles were performed in a random environment. Simulation results are compared with the empirical probability calculations. Results seem promising as they converge to the empirical calculations with the increase in number of shapes.