AI inside Selected for Phase 3 of METI and NEDO’s GENIAC Program

July 15, 2025

News

AI inside Selected for Phase 3 of METI and NEDO’s GENIAC Program

Developing a Japanese Full-Duplex-Speech Multimodal LLM Following Phase 2

Tokyo, July 15, 2025 – AI inside Inc., a provider of an AI platform, announces that, following its selection in Phase 2, it has once again been awarded a new research theme under Phase 3 of the “Generative AI Accelerator Challenge (GENIAC),” *1 a domestic generative AI capability enhancement project led by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO). The selected theme, “Research and Development of a Japanese Full-Duplex-Speech Multimodal LLM,” commenced in March 2025.

Under this project, AI inside will develop a compact multimodal generative AI foundation model capable of naturally understanding speech, images, and text, and engaging in natural Japanese dialogue. The model will be integrated into “DX Suite,” the AI Agent that automates data entry tasks, contributing to solving social challenges through collaboration between humans and AI.

Project Overview

Project Name: Research and Development Project of Enhanced Post-5G Information and Communication Systems Infrastructure / Development of Globally Competitive Generative AI Foundation Models (GENIAC)
Proposed Theme: Research and Development of a Consistent Japanese Full-Duplex-Speech Multimodal LLM
NEDO Announcement: https://www.nedo.go.jp/koubo/CD3_100397.html (Japanese only)

*1 A METI-led initiative aimed at strengthening domestic generative AI development capabilities by supporting the provision of computing resources for the development of foundation models, the core technology of generative AI, while promoting collaboration among stakeholders and public outreach. Phase 3 runs from August 2025 to the end of February 2026 (planned) and provides support for research and development of foundation models. https://www.meti.go.jp/policy/mono_info_service/geniac/index.html

Objectives and features of the R&D

This project will conduct research and development on a Japanese-specific multimodal generative AI model.

・Full-Duplex voice interaction
Aim to achieve 200ms speech latency, on par with human conversation speed, for natural, uninterrupted Japanese commercial Full-Duplex dialogue.

・Multimodal capability
Understand and integrate information from images, speech, and text to provide accurate, context-aware responses, enabling advanced dialogue AI capable of referencing documents and charts for immediate business use.

・High performance in a compact model
Deliver the accuracy of large-scale models and the responsiveness of lightweight, high-speed models within a single architecture, ensuring a precise understanding of conversational flow and user intent for consistent responses.

The model will be deployed in “DX Suite,” targeting use cases in workplaces with heavy voice data usage such as municipality service counters, healthcare and nursing sites, contact centers, and sales/consulting departments. Applications include automatic structuring and analysis of meeting content, VoC analysis and FAQ auto-generation, medical record efficiency improvements, legal consultation and testimony organization, and municipal meeting transcription.

By integrating into “DX Suite,” already adopted by over 3,000 organizations, AI inside will accelerate the social implementation of Japan-developed multimodal LLMs, building a foundation for human–AI collaboration and addressing labor shortages caused by a declining population, thereby contributing to nationwide productivity gains.

Comment by Representative Director, President & CEO, Taku Toguchi

AI is evolving from a “tool for operational efficiency” into an “entity that shares thought and judgment,” and we are expected to enter the era of AGI within the next three years. Achieving this requires not only vast computing resources but also meticulously designed architectures and a wealth of reliable, real-world data. This initiative is a challenge to elevate the two core elements—design and data—from Japan to the world.

Our goal is to develop a lightweight yet high-performance Japanese multimodal model that integrates speech, images, and text to think alongside humans and enhance decision-making quality. This is not mere automation; it is the realization of AI as a “partner in judgment and dialogue.” This model will serve as an “entry point” for seamlessly connecting human and societal intelligence, potentially becoming the next-generation AI standard from Japan to the world.
We are deeply grateful for the opportunity that GENIAC provides to bring this vision closer to reality.

Comment by Executive Officer, CTO, Takuma Inoue

Through this project, we are taking on “three world-first challenges.

First, we aim to realize Full-Duplex conversation, speaking and listening simultaneously, combined with the ability to understand visual information such as documents and slides within a single Japanese model, delivering a natural experience usable seamlessly in professional settings.
Second, we are addressing Full-Duplex’s challenge of “context retention,” ensuring uninterrupted conversation while handling multi-turn discussions and interpreting complex intentions.
Third, we are miniaturizing the model while maintaining advanced capabilities, making it deployable on our “AI inside Cube” hardware so that customers can utilize it immediately in their own environments.

To achieve this, we are revisiting architecture design from the ground up and incorporating modality expansion technologies, including LoRA, to handle speech, images, and more. Such configurations are rare worldwide for real-world deployment and present significant technical challenges. We see great value in this endeavor and are fully committed to making a substantial contribution to Japanese LLM development.

AI inside’s R&D in Generative AI and LLMs

AI inside has been researching and developing the “Customizable SLM,” which can learn from proprietary enterprise data, and the PolySphere series of LLMs specialized in Japanese document processing.
In June 2025, as an outcome of its Phase 2 GENIAC project selection, AI inside released PolySphere-3, a major update to PolySphere-2, achieving world-leading performance in data structuring accuracy. The company also established “autonomous distillation” technology, enabling models to learn and optimize continuously for ongoing accuracy improvements.

*2 AI inside Releases “PolySphere-3,” a Major Update of Its In-House Developed LLM

About AI inside Inc.

AI inside is a tech company engaged in the research, development, and social implementation of generative AI, large language models (LLMs), and autonomous AI. We have developed the Japanese-language-optimized large language model PolySphere and have delivered our solutions to over 3,000 organizations, including government agencies, local municipalities, and private enterprises, while continuing to advance the development and adoption of our proprietary AI infrastructure. Our flagship product, “DX Suite,” is an AI Agent designed to automate the entire workflow surrounding data entry tasks. Through these initiatives, we promote collaboration between humans and AI, realizing a “VALUE SHIFT” that transforms time saved through productivity and efficiency improvements to higher-value work.
Website: https://inside.ai/en

*The service names appearing on this site are trademarks or registered trademarks of each company.

Contact for Press Inquiries
AI inside Inc. (https://inside.ai/en/) Public Relations Unit
TEL: +81-3-5468-5041 E-mail: pr@inside.ai