Welcome to the official website for the Winter Vision Workshop in Clearwater Beach, Florida.

Dates: Jan 16, 17, & 18. 2013

Venue: Clearwater Beach, Florida. Hotel information available here.


Photos of WVM are available here.

The Pocket Guide is now available

Plenary speakers have been confirmed:

  • Wednesday, Jan 16th, 10:15-11:00: Dong Ping Zhang
    Title: Supporting Computer Vision through High Performance GPU Programming
    Abstract: In this talk, I will discuss the support that AMD hardware and software infrastructure can provide for developing applications in computer vision and its related domains. It includes offering and supporting the OpenCL C++ binding and OpenCL C++ kernel language extension, The BOLT C++ template library for harnessing heterogeneous compute power, the OpenCL module developed for the industry standard OpenCV library and two other university collaboration projects: content-based image retrieval and future computing architectures for simultaneous localization and mapping (SLAM). This presentation will also highlight the evolution of AMD discrete GPU and APU architecture designs and how AMD is working to increase the programmability and ease the domain-specific scientists’ access to this new level of compute resources.
    Bio: Dong Ping Zhang recently joined AMD Research group, focusing on processing in memory for Exascale computing. Before this, she worked on AMD heterogeneous system architecture and its successors, specializing in domain specific workloads to guide AMD’s future hardware and software roadmap to better address the needs of the real-world application domains, analyzing performance and energy impacts on proposed hardware designs, and advocating for architectural changes to increase performance or programmability. Prior to her employment in AMD, she was with Biomedical Image Analysis Group in Imperial College London.
  • Thursday, Jan 17th, 10:15-11:00: Larry Matthies
    Title: The Curiosity Mars rover, and other curiosities
    Abstract: Computer vision has played important roles in every landed U.S. Mars mission since the Pathfinder mission in 1997. I will very briefly recap that history, then give an overview of the Mars Science Lab (MSL) mission, MSL’s rover “Curiosity”, and how vision algorithms are used onboard and were used on Earth to help pick the landing site and to map out potential exploration routes well before launch. Several other vision-related capabilities are in an advanced stage of development and are now candidates for inclusion in the next rover mission, which NASA has announced will launch in 2020; these include an onboard vision system for safe and precise landing and an FPGA coprocessor to greatly accelerate stereo vision, visual odometry, and potentially path planning. The long-term objective for Mars exploration remains to return samples. I will briefly sketch recent thinking about how that may be done; however, budget constraints have put this objective on hold and may dictate reconsideration of the approach. Time permitting, I will also review a few highlights of non-NASA robot vision work at JPL.
    Bio: Larry Matthies received his PhD in computer science from Carnegie Mellon University in 1989 and has been at the Jet Propulsion Laboratory since then, where he currently supervises the Computer Vision for Surface Applications Group. His research focuses on perception for autonomous navigation of unmanned vehicles for land, sea, air, and space applications. He and his group developed algorithms for stereo vision, visual odometry, and terrain-relative velocity estimation that have been used on Mars since 2004. He is an IEEE Fellow, an adjunct professor of computer science at the University of Southern California, a member of the editorial boards for the Journal of Field Robotics and the Autonomous Robots journal, and was a co-recipient of the 2008 IEEE Robotics and Automation award for his work on vision for space exploration.
  • Friday, Jan 18th, 10:15-11:00: Robert Pless
    Title: Re-Purposing all the Worlds Webcams with Applications to Environmental Measurement
    Abstract: Fantastic imaging resources are already freely available through the web --- publically available webcams that parks, roads, cities, beaches, mountains, ski-resorts, buildings and more. Organizing this massively distributed resource as a tool for measurement requires revisiting classic computer vision questions in camera calibration, geo-location, scene structure estimation and scene labeling.
    The Archive of Many Outdoor Scenes (AMOS) dataset has been archiving imagery from more then 23000 cameras distributed around the work, some going back more than 6 years. I will highlight properties of natural lighting and atmospheric effects and explain how these give cues for parts of camera calibration and structure estimation problems. Our approaches are inspired in part by a combination of time-lapse video artists Jason Salavon, and photographers Hiroshi Sugimoto and James Sanborn, as well as work to characterize the statistical invariants in images of natural scenes. Once calibrated this imaging resource can be used for a variety of global measurement problems. In the context of plant phenology (measuring the annual patterns of spring green-up time, budding and leaf loss), I will highlight some interesting and unexpected results that highlight how large scale ground level imaging differs from satellite imaging for environmental monitoring.
    Bio: Robert Pless is a Professor of Computer Science and Engineering at Washington University in St. Louis. His research focus is data driven approach to understanding motion and change in video, with a current focus on long term time-lapse imagery. Dr. Pless has a Bachelors Degree in Computer Science from Cornell University in 1994 and a PhD from the University of Maryland, College Park in 2000. He chaired the IEEE Workshop on Omnidirectional Vision and Camera Networks (OMNIVIS) in 2003, the MICCAI Workshop on Manifold Learning in Medical Imagery in 2008, and the IEEE Workshop on Motion and Video Computing in 2009, and received the NSF CAREER award in 2006.
  • Friday, Jan 18th, 15:10-15:55: John Garofolo
    Title: The Spatio-Temporal Multi-Camera Location Tracking Evaluation (STIMULATE) Program
    Abstract: The goal of the STIMULATE program is to foster the development of technologies to accurately geospatially track persons or objects of interest from massive networks of video surveillance cameras in dense scenes and highly varying camera coverage. The STIMULATE challenge is likely to require the fusion of appearance-based detection and tracking technology with biometric and trajectory estimation methods with camera/environment calibration methods. STIMULATE algorithms will be required to detect and track persons or objects of interest within single or multiple camera views and project the location of the tracked person/object into 3D space. STIMULATE is truly a Big Data challenge requiring simultaneous processing of many high definition camera streams with real-time forward tracking estimation and much-faster-than-realtime backtracking estimation. Two forms of metrics will be created to measure performance: one based on geo-spatial distance error and another based on camera selection accuracy. Given the significant data and computation challenges, STIMULATE will provide the video data to algorithms in a Cloud-based streaming simulation. STIMULATE will also provide a Cloud-based computational platform that algorithms will access through an API that will be built through a consensus standards process. The STIMULATE Cloud will support scalability, component sharing, economical developmental testing, controlled access to evaluation-sensitive data, and diagnostic feedback.
    The 5-year STIMULATE program has several comprehensive goals: create a multidisciplinary community of interest to accelerate innovations in visual tracking technology across massive camera networks, develop new efficient methods for implementing scalable evaluations of video surveillance technologies that lower the barrier of entry, creating a realistic measure of the state-of-the-art for visual tracking technology in operationally-relevant environments, and catalyze consensus standards for performance measurement and integration architectures. The ultimate goal of the program is to accelerate the development of deployable technology that is robust to multiple environments and that can be automatically calibrated to new environments and camera networks. The goal of the first year of the program is to develop the first data collection, the specification for the Cloud evaluation platform and metrics, hold a kickoff workshop, and initiate a pilot evaluation.
    Bio: John Garofolo has a BS in Computer Science from the University of Maryland and a MS in Computer Science from Johns Hopkins University. He has been with the National Institute of Standards and Technology (NIST) Information Technology Laboratory since 1987 leading a number of human language-, computer vision-, and multimedia technology evaluation activities. In the late 80s and early 90s, his work focused on the evaluation of early speech-to-text continuous speech transcription systems and speech understanding systems in the context of a number of DARPA speech and language technology programs.  His work on data- and evaluation-driven research and development helped to significantly speed progress in these technologies.  In the late 1990s, he extended his work to information retrieval applied to speech recognition in the TREC Spoken Document Retrieval Track which later evolved into the NIST TRECVid Program. He saw multimodality as a key challenge for the future and spearheaded the construction of a massively instrumented multimodal/multichannel meeting room at NIST. The data collected in the room has since been used in a variety of speech and computer vision evaluations and the project inspired two European programs focused on multimodality. He also founded the NIST Multimodal Information Group to develop measurement science methods for multimodal technologies. In 2004, he began working with the DoD Advanced Research and Development Activity (ARDA) Video Analysis and Content Extraction (VACE) Program under what is now the Intelligence Advanced Research Projects Activity (IARPA) of the Office of the Director of National Intelligence (ODNI) to adapt approaches used in speech technology evaluation to the computer vision domain. He developed an extensive corpus- and metrics-based evaluation framework for video object detection and tracking technologies, as well as video text recognition technologies.  He extended this framework and organized a series of open evaluations of multiple camera tracking and event detection technologies in the video surveillance domain. He created the Classification of Events Activities and Relationships (CLEAR) Consortium with the European CHIL and AMI programs and coordinated a series of joint multimodal technology evaluations in 2006 and 2007.  In 2009, he developed the concept for the Video Surveillance Technologies for Retail Security (ViSiToRS) Program at NIST to create a consortia effort at developing advanced video analysis technologies to address retail theft.   In 2010, he created the Automated Low Level Analysis and Description of Diverse Intelligence (ALADDIN) Video Program at ODNI/IARPA.  ALADDIN is a 5-year program developing advanced extraction, knowledge representation, event detection and recounting technologies for massive collections of open source multimedia clips to provide analysts with critically-needed multimedia search and analytic tools. In 2012, he returned to NIST to lead strategic planning for the NIST Information Technology Laboratory. In 2012, he also began the Spatio-Temporal Multi-Camera Location Tracking Evaluation (STIMULATE) program in collaboration with DHS to evaluate geo-spatial tracking technology from large networks of surveillance cameras in a Cloud-based video streaming and computational architecture. He also led a cross-cutting team at NIST to develop a concept to create a multi-stakeholder metrics-based approach to the creation of a framework for a privacy-preserving Trusted Data Ecosystem.   He is now providing senior leadership for the development of new measurement science programs focused on Big Data analytic technology challenges within the NIST Information Access Division.  He continues to bring his knowledge of data- and evaluation-driven research, program management expertise, interdisciplinary perspective, creativity, and enthusiasm to bear to bring diverse communities of interest together in novel ways to create innovative approaches to the development of next-generation analytic technologies.

VISA information is listed here.

Hotel reservation is now available.

Registration is now open.

A new job board is now available. To list a job, please contact the web chair or webmaster.