Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.

Visual Prompt Generation: Cross-Attention in Q-Former

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

\ Figure 7. Overview of QFormer

A. Detailed Architecture of QFormer

The architecture overview is depicted in Figure 7. Specifically, QFormer is initialized as a BERT-based model[8] comprising a total of L = 12 layers. In contrast to typical BERT models that process textual inputs, QFormer takes R = 32 learnable query embeddings as inputs. These embeddings are utilized to extract visual information from the input visual data during Stage-1 pretraining in BLIP2[22]. Subsequently, they serve as visual prompt embeddings for the LLM inputs after projection.

\ Inside the QFormer, each layer includes a self-attention module composed of a Multi-Head Attention component and a Forward module (consisting of Linear, LayerNorm, and Residual Connection). The cross-attention module, initialized with random values, is inserted every G layers, where learnable query embeddings interact with visual embeddings. In the main paper, for the sake of conciseness, we condensed the representation of the multi-head attention and forward modules into self(cross) attention modules. Furthermore, we exclusively illustrated the modifications made to the cross-attention module in MIVPG, as the self-attention modules remain unchanged. The final QFormer output is represented by the last layer’s query embeddings.

\ For a more comprehensive understanding, readers are encouraged to refer to [22].

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington ([email protected]);

(2) Wenyi Wu, Amazon ([email protected]);

(3) Qi Li, Amazon ([email protected]);

(4) Rob Barton, Amazon ([email protected]);

(5) Boxin Du, Amazon ([email protected]);

(6) Shioulin Sam, Amazon ([email protected]);

(7) Karim Bouyarmane, Amazon ([email protected]);

(8) Ismail Tutar, Amazon ([email protected]);

(9) Junzhou Huang, The University of Texas at Arlington ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Prompt Logo
Prompt Price(PROMPT)
$0.05502
$0.05502$0.05502
-2.87%
USD
Prompt (PROMPT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip

Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip

The post Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip appeared on BitcoinEthereumNews.com. Gold is strutting its way into record territory, smashing through $3,700 an ounce Wednesday morning, as Sprott Asset Management strategist Paul Wong says the yellow metal may finally snatch the dollar’s most coveted role: store of value. Wong Warns: Fiscal Dominance Puts U.S. Dollar on Notice, Gold on Top Gold prices eased slightly to $3,678.9 […] Source: https://news.bitcoin.com/gold-hits-3700-as-sprotts-wong-says-dollars-store-of-value-crown-may-slip/
Share
BitcoinEthereumNews2025/09/18 00:33
YouTube Advertising Formats: A Complete Guide for Marketers

YouTube Advertising Formats: A Complete Guide for Marketers

In today’s fast-evolving digital landscape, YouTube has emerged as one of the most powerful platforms for marketers looking to engage audiences through video. With
Share
Techbullion2026/01/21 01:49
Health Insurers To Cover Covid Vaccines Despite RFK, Jr. Moves

Health Insurers To Cover Covid Vaccines Despite RFK, Jr. Moves

The post Health Insurers To Cover Covid Vaccines Despite RFK, Jr. Moves appeared on BitcoinEthereumNews.com. The nation’s biggest health insurance companies will continue to cover vaccinations – including those against Covid-19 and seasonal flu – previously recommended by a federal advisory committee, America’s Health Insurance Plans said Wednesday, Sept. 17, 2025. In this photo is a free flu and Covid-19 vaccine shots available sign, CVS, Queens, New York. (Photo by: Lindsey Nicholson/Universal Images Group via Getty Images) UCG/Universal Images Group via Getty Images The nation’s biggest health insurance companies will continue to cover vaccinations – including those against Covid-19 and seasonal flu – previously recommended by a federal advisory committee. The announcement by America’s Health Insurance Plans (AHIP), which includes CVS Health’s Aetna, Humana, Cigna, Centene and an array of Blue Cross and Blue Shield plans as members, comes ahead of the first meeting of the reconstituted Advisory Committee on Immunization Practices, which now has new members chosen by U.S. Health and Human Services Secretary Robert F. Kennedy Jr., a vaccine critic. “Health plans are committed to maintaining and ensuring affordable access to vaccines,” AHIP said in a statement Wednesday. “Health plan coverage decisions for immunizations are grounded in each plan’s ongoing, rigorous review of scientific and clinical evidence, and continual evaluation of multiple sources of data.” The move by AHIP is good news for millions of Americans at a time of year when they flock to drugstores, pharmacies, physician’s offices and outpatient clinics to get their seasonal flu and Covid shots. Kennedy’s changes to U.S. vaccine policy have created confusion across the country over whether certain vaccines long covered by insurance would continue to be. AHIP has now provided some clarity for millions of Americans. “Health plans will continue to cover all ACIP-recommended immunizations that were recommended as of September 1, 2025, including updated formulations of the COVID-19 and influenza vaccines, with no cost-sharing…
Share
BitcoinEthereumNews2025/09/18 03:11