Workshop Overview

The "2nd Workshop on Vision-Based Learning & Linguistics (WVLL)" aims to create a dynamic and interactive forum for researchers exploring the rapidly evolving intersection of computer vision, natural language processing, and linguistic principles to achieve deeper and more nuanced machine understanding. As vision-language models (VLMs) demonstrate increasingly sophisticated capabilities, WVLL will focus on the critical challenges and opportunities that lie ahead, emphasizing the development of models that are not only powerful but also efficient, equitable, and grounded in a robust understanding of both visual and linguistic structures.

Key Areas of Exploration:

  • AI For Low-Resource Languages
  • Video And Speech Analysis For Low-Resource Languages
  • LLM and VLM Architectures and Neural Design
  • Parameter-Efficient Adaptation of Large Vision-Language Models
  • Applications in Vision-Language Models
  • Tiny VLMs: Efficient Multimodal AI at the Edge
  • New Benchmark Dataset & Evaluation Metrics
  • AI for Sign Language Understanding
  • Document Image Processing
  • Medical Data Analysis
  • Scene Text Detection And Recognition

The primary goal of WVLL is to foster a rich exchange of ideas that can crystallize common problems and illuminate promising scientific paradigms in vision-language research. We aim to explicitly contrast competing frameworks, clarify essential research questions, and cultivate a stronger community around these shared interests. WVLL will distinguish itself by its balanced emphasis on theoretical advancements in model design and the practical, societal implications of their deployment, particularly in resource-constrained and specialized domains. We believe this workshop will be highly valuable to the NeurIPS community by providing a focused platform to discuss the frontiers of multimodal AI, encouraging interdisciplinary collaboration, and charting a course towards more comprehensive and responsible vision-language understanding systems. We will encourage the presentation of work-in-progress and forward-looking position papers, fostering a vibrant discussion that looks towards future breakthroughs.

Invited Speakers

Confirmed Speakers

Tentative Speakers

Diversity, Equity & Inclusion Plan

WVLL 2025 embeds diversity and inclusion across organizers, speakers, and attendees through concrete, realistic actions. Our nine-member committee of three women, one non-binary researcher, and five men spans four continents, balances academia (five members) with industry/NGO roles (four), and blends four seniors with five mid-career scientists, creating natural mentorship pathways and technical breadth from computer vision to clinical AI. We are deliberately recruiting invited speakers through affinity groups and regional mailing lists to secure meaningful representation of women, non-binary scholars, and researchers based in the Global South; early acceptances already span the USA, Malaysia, Portugal, Bangladesh, and China. The gender-neutral CFP explicitly welcomes work on sign-language AI, low-resource languages, and edge deployment in underserved regions, while an optional mentored-review track will pair junior authors with experienced PC members. External sponsorships are being pursued to fund travel stipends prioritized for students from low- and middle-income countries and for caregivers. Live captioning, wheelchair-accessible poster spacing, and an anonymous code-of-conduct reporting channel coordinated by our DEI chair will ensure a safe, inclusive environment, making diversity and broad participation integral to WVLL 2025 rather than an afterthought.

Estimated Number of Attendees

Given the growing interest in multimodal AI, particularly in the areas of low-resource language processing, efficient model adaptation, and applied vision-language systems, we anticipate attracting a diverse audience from both academia and industry. Based on the relevance of our topics—including LLM/VLM architectures, sign language understanding, document image processing, and medical data analysis—we estimate an attendance of approximately 80-100 participants. This includes researchers, practitioners, and students interested in vision-language learning, efficient model design, and AI applications for underrepresented and resource-constrained domains.

Special Requirements and Technical Needs

The WVLL workshop will be a one-day, in-person event in accordance with NeurIPS 2025 guidelines. We request a standard A/V setup, including a projector with HDMI input, screen, microphones for both speakers and audience, and stable internet access to support any live demonstrations. We plan to host a poster session and will need space and boards for approximately 8–10 physical posters. Additionally, we request a table for showcasing interactive demos related to vision-language systems. While the workshop is fully in-person, we may accommodate up to one hour of remote presentation in the event of unforeseen emergencies, as permitted by NeurIPS. The only additional requirement we foresee is ensuring wheelchair accessibility at the venue.

Previous Workshop Edition Overview

This workshop was previously held at WACV 2024, where it focused on vision-language learning for low-resource languages, parameter-efficient model adaptation, and applied multimodal AI. In that edition, we received 14 paper submissions, of which 3 were accepted, resulting in an acceptance rate of approximately 21%. The accepted papers included both extended abstracts and long-format submissions. The authors represented a diverse international background, with submissions from Bangladesh, the United States, and India. The review process was conducted by a panel of 32 expert reviewers from around the world, ensuring a rigorous and fair evaluation process. The workshop was well-received at WACV, and based on the enthusiastic engagement and the growing relevance of our themes, we are now proposing to expand its reach and visibility by bringing it to NeurIPS 2025.

URL of previous workshop: https://wvll.github.io

Brief Bios of Organizers

Fuad Rahman: Fuad Rahman, Ph.D., is an academician and entrepreneur who founded Apurba Technologies, specializing in machine learning. He is also an Adjunct Professor at the University of Arizona's BME Department. His company actively works on computerizing Bangla, a low-resource language, developing the first commercial Bangla OCR and screen reader. He has over 100 peer-reviewed publications.
Email: fuad@apurbatech.com | Website: apurbatech.com

Syed Akhter Hossain: Dr. Syed Akhter Hossain is the Dean of the Faculty of Science and Information Technologies at Daffodil International University. He has significantly advanced NLP research and has over 250 publications. A recipient of the Best Professor of IT Award (2012) and National ICT Award (2016), he notably developed a machine translator for Bangla Braille.
Email: deanfsit@daffodilvarsity.edu.bd | Website: https://faculty.daffodilvarsity.edu.bd/profile/swe/akhter.html

Mouhaydine Tlemcani: Dr. Mouhaydine Tlemcani is an Assistant Professor at the University of Évora, instrumental in their Mechatronics Engineering program. He holds an M.Sc. (1992) and Ph.D. (2007) in Electrical Engineering. His research includes instrumentation, signal/image processing, embedded systems, and AI applications in engineering, leading projects like non-destructive testing for aeronautic maintenance.
Email: tlem@uevora.pt | Website: https://www.uevora.pt/pessoas?id=5279

Tozammel Hossain: Dr. Tozammel Hossain is an Assistant Professor at the University of North Texas, specializing in applied machine learning, causal inference, and biomedical informatics. With a Ph.D. from Virginia Tech and postdoctoral experience at USC, he has contributed to high-impact projects funded by IARPA, DARPA, DHS, and USDA. He has published in leading journals and presented at top conferences.
Email: tozammel.hossain@unt.edu | Website: https://facultyinfo.unt.edu/faculty-profile?profile=kh0718

Tazin Afrin: Dr. Tazin Afrin holds a Ph.D. in Computer Science from the University of Pittsburgh, with expertise in NLP, educational technology, and human-computer interaction. She developed the ArgRewrite revision assistant and published in top-tier venues. At ETS, she develops advanced AI systems using LLMs and machine learning.
Email: tazin.tumpa@gmail.com | Website: https://tazin-afrin.github.io

Ting Xiao: Dr. Ting Xiao is an Assistant Professor in Data Science at the University of North Texas (UNT) and Director of the Deep Sensor Information eXtraction (SIX) Lab. She holds a Ph.D. in Physics from Northwestern University. Her research focuses on Machine Learning/Deep Learning, Vector Embeddings, Multimodal Large Language Models, and Clinical/Biomedical AI, with over 100 publications and an h-index of 36.
Email: Ting.Xiao@unt.edu | Website: https://engineering.unt.edu/people/ting-xiao.html

Sadia Afroz: Dr. Sadia Afroz is a Lead Scientist at Gen™, leading research in Security and Machine Learning. She holds a Ph.D. in Computer Science from Drexel University, specializing in Computer Security. Her expertise lies at the intersection of security, privacy, and machine learning. She previously served as a Research Professor at ICSI and a Staff Scientist at Avast.
Email: sadia@icsi.berkeley.edu | Website: https://www.icsi.berkeley.edu/icsi/people/sadia

Sheikh Abujar: Sheikh Abujar is a Ph.D. candidate in Computer Science at UAB, researching deep learning, vision-language models (VLMs), and clinical natural language processing. He interned at Samsung Research America (2024) and co-led impactful projects, including creating low-resource datasets like Bayanno (Bangla Speech) and IsharaLipi (Bangla Sign Language).
Email: sabujar@uab.edu | Website: https://sites.google.com/site/iamabujarsheikh

AKM Shahariar Azad Rabby: Shahariar Rabby is a researcher at the UAB Lung Imaging Lab and Machine Learning team lead at Apurba Technologies, specializing in OCR, Document Analyses, and Low-Resource Language Vision. He developed "Ekush," the largest Bangla handwritten dataset, and co-founded/supervised the CI LAB and DIU - NLP and Machine Learning Research LAB.
Email: arabby@uab.edu | Website: rabby.dev

Muntaser Syed: Muntaser Syed is a GPU Developer Advocate at NVIDIA and technical lead for the Open Hackathons team, focusing on accelerating research on supercomputing clusters. A Ph.D. scholar, his interests include machine learning on edge devices, NLP, and speech recognition. He contributed to UAV control systems and the FAA's LAANC program.
Email: muntasers@nvidia.com | Website: https://www.linkedin.com/in/muntasersyed

Confirmed Program Committee Members

Reviewer Organization
Abdus SattarDaffodil International University, Bangladesh
Abu Kaisar Mohammad MasumFlorida Institute of Technology, USA
Jagdish Chand BansalSouth Asian University, India
Stephen Olatunde OlabiyisiLadoke Akintola University of Technology, Nigeria
Sunil Kumar KhatriAmity University Tashkent, Uzbekistan
Yagyanath RimalPokhara University, Nepal
Ghalib HussaiynPayPal
Hasmot AliApurba Technologies Ltd
Md. Fahad HossainDaffodil International University, Bangladesh
Mahmudul HasanComilla University, Bangladesh
Mohammad Mamun Or RashidJahangirnagar University, Bangladesh
Md Majedul IslamKennesaw State University, USA
Md. Sanzidul IslamKing Abdulaziz University, Saudi Arabia
Mirza SamiDeka Research & Development
Mohammad Shorif UddinJahangirnagar University, Bangladesh
Mouhaydine TlemcaniUniversidade de Évora, Portugal
Nabeel MohammedNorth South University, Bangladesh
Naveed MahmudFlorida Institute of Technology, USA
Nushrat Jahan RiaDaffodil International University, Bangladesh
Pratim SahaUniversity of Alabama at Birmingham, USA
S.R. SubramanyaNational University (San Diego, USA) / Exskillence
S.M. Saiful Islam BadhonUniversity of North Texas, USA
Saif IslamCharles Schwab
Sandeep BodduluriUniversity of Alabama at Birmingham, USA
Sharun Akter KhushbuDaffodil International University, Bangladesh
Syed Ashiqur RahmanGSK, USA
Tanvir AhmedUniversity of Central Florida, USA
S.M. Mazharul Hoque ChowdhuryUniversity of North Texas, USA
Monjurul HudaAmazon