projects | Amod K. Agrawal

Industry

Indoor Localization—building spatial context for the Alexa ecosystem

Amazon Lab126

Indoor positioning using GPS is unreliable in enclosed environments, yet spatial context remains essential for enabling intelligent interactions with voice assistants and other connected devices. We introduced a foundational software framework for real-time spatial awareness in indoor environments, leveraging existing wireless infrastructure and devices' sensing capabilities. By utilizing commodity radios such as Bluetooth Low Energy (BLE), Wi-Fi, Zigbee, and Ultra-Wideband (UWB), as well as other modalities including ultrasound and inertial tracking, the system performs distance estimation and positioning. It abstracts complex RF-based algorithms into a unified interface, supporting spatial use cases including proximity detection, device-to-device ranging, spatial presence, and user tracking. The framework has been developed through extensive real-world experimentation, systematically evaluating the performance, reliability, and limitations of each wireless modality under dynamic environmental conditions. To ensure scalability and robustness, the system integrates techniques from signal processing, machine learning, and edge computing. It supports multimodal sensor fusion to accommodate increasing heterogeneity in device form factors and sensing capabilities.

While not directly exposed to end users, this platform acts as a core enabler for higher-layer applications that require spatial context, including room-aware assistance, presence sensing, and personalized user experiences. It integrates with LLM-based Alexa+, providing it with location context to facilitate more intelligent and adaptive behaviors. This work contributes to the broader vision of ambient computing by enabling distributed intelligence across smart devices, with applications spanning smart homes, automotive systems, and retail environments.

Harnessing AutoMobiles for Safety (HAMS)

Microsoft Research

We explore the use of low-cost sensing solutions to enhance road safety and driving efficiency. By retrofitting off-the-shelf smartphones onto vehicle windshields, HAMS constructs a “virtual harness” that simultaneously monitors driver behavior, vehicle dynamics, and road context. The front camera observes the driver, the rear camera captures the road ahead, and onboard sensors such as GPS and accelerometers provide motion data. This multimodal sensing approach enables the system to detect complex events. For example, correlating hard braking with vehicle proximity and driver distraction—to provide actionable feedback for safer driving.

The project addresses several real-world challenges, including variation in vehicle configurations, inconsistent road infrastructure (e.g., unmarked or variable-width lanes), and the need for efficient operation on resource-constrained mobile devices. HAMS employs a hybrid approach that combines lightweight computer vision techniques with deep learning models to ensure accuracy and scalability under these constraints.

In addition to safety monitoring, my work extended to fuel efficiency analysis by integrating data from On-Board Diagnostics (OBD) interfaces. The system detects clutching, gear transitions, and aggressive driving behaviors, then quantifies their impact on fuel consumption. In urban traffic settings, the system revealed that up to 35% of fuel can be wasted during idle periods. Machine learning techniques, including regression models and unsupervised clustering, were used to model fuel usage patterns and identify geographic zones of inefficiency.

HAMS has been piloted in driver training programs in collaboration with the Institute of Driving and Traffic Research (IDTR) and demonstrated potential use cases in fleet management and intelligent mapping services. The system offers a scalable framework for smart automotive diagnostics, context-aware fuel analytics, and automated driver coaching, with practical deployment potential in both developed and emerging markets.

Hey Disney! — Alexa collaboration — Enabling wireless sensing use-cases for Disney resorts and parks

Amazon Lab126

As part of a collaboration between Amazon and Disney, this project enabled proximity-based guest experiences at scale by integrating the MagicBand+ wearable with Alexa-enabled devices. Deployed in over 28,000 rooms at Walt Disney World Resort, the system allows guests to seamlessly interact with in-room voice assistants by simply bringing their MagicBand+ near the device. These location-aware features, though subtle, enhance the overall guest experience by adding responsive, immersive touches that align with Disney’s emphasis on storytelling and magic. The underlying capabilities demonstrate how real-time localization and device awareness can be integrated into hospitality environments to create intuitive, personalized interactions.

Change Point Detection Using Edit Distance

IBM Research

This project explores a novel method for detecting change points in time-series data using Edit Distance, enabling detection in both numerical and textual domains. Traditional change point detection techniques are limited to numerical inputs, but by quantizing data and representing it as symbolic sequences, this approach applies Levenshtein distance to identify structural deviations in system behavior. The algorithm scans for shifts in data patterns using a sliding buffer technique and flags points of semantic change—regardless of magnitude. Validated on diverse datasets such as weather records, eye movement, and stock prices, the method can support real-time monitoring of software systems, data centers, and network traffic to detect faults and anomalies early. This work opens pathways for more adaptive, domain-agnostic fault detection mechanisms in large-scale systems in real-time.

trAnSLator: An Intelligent Gesture-Based IoT Communication Engine for the Deaf and Mute

Microsoft Research

trAnSLator is an intelligent communication system designed to facilitate natural interaction for individuals who are deaf or mute. It interprets sign language gestures into real-time, coherent spoken sentences, enabling users to engage in everyday conversations and even conduct phone calls using sign language. The system processes time-series gesture data through feature extraction and dynamic time warping algorithms, and assembles recognized gestures into meaningful sentences using a conversational bot framework. Built using a MYO armband, which captures electromyographic and inertial signals and integrated with an Android application, trAnSLator supports personalized gesture learning and incorporates contextual awareness to improve communication accuracy and fluency.

Academia

Collective Aspects of Privacy — Sensing and Localization in Online Networks

IIIT-Delhi, ETH-Zurich

We study how user attributes such as location and biography can be inferred in online networks through proxy social sensing—not from individuals themselves, but from their connections. Using only the information shared by contacts who joined the network earlier, we evaluate how accurately a user's location can be predicted without their direct participation. Our findings reveal that individuals can be localized with surprising precision (median error of ~68 km on the global map, versus ~6300 km in the null model), especially when many of their contacts have shared mobile data. This demonstrates that privacy in online networks is collectively determined, not individually controlled.

We apply unsupervised techniques, including modal city prediction for location and vector similarity for biographical attributes, and benchmark against randomized baselines (null model). While biographical features are harder to infer, their predictability increases meaningfully as the number of disclosing connections grows. Our analysis also shows that broader disclosure behavior across the network systematically improves inference accuracy, highlighting how individual privacy is shaped by the behavior of others. This work introduces a new form of indirect localization, where network structure and peer behavior function as latent sensors for user attributes.

This study provides the first empirical support for the shadow profile hypothesis, demonstrating that online networks can infer personal information about non-users or passive participants through the disclosures of others. It raises important questions about the nature of privacy in digital ecosystems, where user-level consent is insufficient to safeguard personal information in a socially connected world.

HeadTrack: Tracking head orientation using wireless signals

University of Illinois at Urbana-Champaign

Head orientation tracking is critical for a range of mobile computing applications, including AR/VR, assistive technologies, and spatial interaction. Traditional solutions rely on infrastructure-based systems involving cameras, lasers, or high-end inertial sensors—limiting user mobility and constraining deployment to fixed environments.

HeadTrack presents a wearable, infrastructure-free system that estimates a user’s head orientation using wireless signals. The system consists of a necklace-like wearable with a headset and chest-piece, each embedded with ultra-wideband (UWB) radios. By precisely estimating multiple distances between the headset and torso, HeadTrack infers the 3D orientation of the head relative to the body.

To overcome the typical ~10 cm ranging limitation of UWB, the system introduces a reference-assisted design by splitting the transmitted signal across both wireless and wired paths. This approach reduces the ranging error to approximately 5 mm. Additionally, an onboard IMU is used to resolve phase ambiguities and ensure consistent tracking over time.

Evaluated using ViCon ground truth data, HeadTrack achieves a head orientation tracking accuracy of 6.5°, offering a portable, occlusion-free, and cost-effective alternative to conventional motion capture systems. The system demonstrates how body-worn UWB sensing can enable fine-grained, infrastructure-free motion tracking in real-world settings.

Localization using Speech Angle of Arrival

University of Illinois at Urbana-Champaign, Amazon Lab126

We explore a passive and infrastructure-light approach to indoor localization using arbitrary human speech captured by spatially distributed smart devices. As voice assistants become increasingly embedded in modern environments, the ability to localize speakers using only existing audio hardware presents a scalable and privacy-conscious alternative to vision or wearable-based tracking.

The system leverages time-of-arrival differences in speech signals recorded across multiple microphone-equipped devices, such as smart speakers, TVs, or home robots and estimate the Angle of Arrival (AoA) at each device. It builds upon the classical Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method, introducing enhancements (GCC+) such as feature-space expansion and subsample interpolation for improved time-delay estimation and angular precision.

By combining AoA estimates through geometric triangulation, the system infers the speaker’s two-dimensional position in real time. Notably, this approach requires no prior calibration, no knowledge of the spoken content, and no active participation from the user, making it well-suited for ambient and context-aware applications in smart homes, offices, and assistive settings.

Evaluated in a real-world residential environment, the system achieves a median AoA estimation error of 2.2 degrees and a median localization error of 1.25 meters. This work demonstrates the feasibility of using passive audio signals to enable spatially aware interactions, while proposing extensions to coplanar arrays, fusion with RF-based localization, and speaker profiling to support multiple human speakers in the space.