This article explains retrieval strategies in medical image databases built on vector embeddings. Starting with slice-based retrieval as the baseline, it expands into volume-based, region-based, and localized retrieval methods. Each approach is evaluated with recall metrics, highlighting how region-centric and localized techniques improve accuracy in identifying and matching anatomical structures. The piece demonstrates why moving beyond simple slice similarity is crucial for practical medical imaging applications, where precision and localization directly impact system usefulness.This article explains retrieval strategies in medical image databases built on vector embeddings. Starting with slice-based retrieval as the baseline, it expands into volume-based, region-based, and localized retrieval methods. Each approach is evaluated with recall metrics, highlighting how region-centric and localized techniques improve accuracy in identifying and matching anatomical structures. The piece demonstrates why moving beyond simple slice similarity is crucial for practical medical imaging applications, where precision and localization directly impact system usefulness.

How AI Retrieves Anatomical Structures Using Vector Databases

6 min read

Abstract and 1. Introduction

  1. Materials and Methods

    2.1 Vector Database and Indexing

    2.2 Feature Extractors

    2.3 Dataset and Pre-processing

    2.4 Search and Retrieval

    2.5 Re-ranking retrieval and evaluation

  2. Evaluation and 3.1 Search and Retrieval

    3.2 Re-ranking

  3. Discussion

    4.1 Dataset and 4.2 Re-ranking

    4.3 Embeddings

    4.4 Volume-based, Region-based and Localized Retrieval and 4.5 Localization-ratio

  4. Conclusion, Acknowledgement, and References

2.4 Search and Retrieval

After creating the vector database, the search is performed using the embeddings extracted from slices of query volumes. The simplest way of retrieval is to match a 2D query slice q with the most similar 2D slice in the database s ∗ by finding the slice-embedding that maximizes the cosine similarity with respect to the embedding associated with q, i.e.

\

\ where ⟨·, ·⟩ denotes standard scalar product, ∥·∥2 the euclidean norm, ϕ the embedding mapping and vs = ϕ(s)/∥ϕ(s)∥2 the pre-computed, normalized embedding associated to slice s stored in a vector index. In Khun Jush et al. [2023] the slice-wise retrieval was introduced as the lower bound baseline for evaluating the proposed aggregation and sampling schemes. Similarly, in this work, we keep the slice-wise evaluation as the lower bound for the retrieval rate of our methods. This method is the lower bound because for each slice only one slice is retrieved and for the perfect recall all the anatomical structures visible in the query slice should match the retrieved slice. In this baseline, each slice q of the query dataset is considered as an individual search instance. In addition, we performed and evaluated image retrieval in three additional scenarios:

\ 2.4.1 Volume-based retrieval and evaluation

\

\ 2.4.2 Region-based retrieval and evaluation

\ In this setting the search system is queried with an image (sub-)volume which is constrained to a specific anatomical sub-region (e.g. liver, pancreas, heart,…). For each anatomical region, we want to individually assess the capability of the system to retrieve an image volume containing the anatomical region.

\ The query (sub-)volumes for different anatomical regions are generated as follows. Given a selected anatomical region r and a query image volume VQ = [q1, …, qn], the smallest subset slices VQ,r = [qm, …, qk] ⊂ VQ is chosen that entirely

\ Figure 3: Region-based retrieval. Anatomical regions are considered individually. A sub-volume constrained to an anatomical region of interest r is generated and fed to the search system to retrieve a volume containing the anatomical region. A case is considered a True Positive (TP) if the retrieved case contains the region r at some location.

\ contains the anatomical region r visible in VQ. Based on the sub-volume VQ,r a similarity search is conducted to build up a hit-table, and the count-based aggregation is conducted to finally retrieve for this query the volume with most hits, as described in Section 2.4.1.

\ In this scenario, the evaluation is done for each anatomical region individually utilizing again the measure of Recall. To this end, for a selected anatomical region r the region-centric query sub-volumes are fed to the search system and the aggregated labels of the associated retrieved volumes are compared to r. The recall in this setting is high if the aggregated retrieved volume labels contain r. Hence, in this evaluation setting it is only required that the retrieved volume contains the anatomical region of interest. It is not required that the search system identifies the exact slices where the anatomical region is visible. The overview of this method is depicted in Figure 3. The question of whether the system can exactly localize anatomical regions will be addressed in Section 2.4.3. For example in Figure 3, V3 is retrieved for the r = ‘pancreas’ anatomical sub-region. During the evaluation, this instance is classified as a True Positive (TP) because the retrieved volume V3 contains ’pancreas’, regardless of whether the matched slices contained ’pancreas’.

\ 2.4.3 Localized retrieval and evaluation

\ In this setting, the system is queried with an image sub-volume which is constrained to a specific anatomical sub-region (e.g. liver, pancreas, heart,…). For each anatomical region, we want to individually assess the capability of the system to retrieve an image volume containing the anatomical region and to localize the region of interest within the retrieved volume.

\ The query sub-volumes VQ,r for different anatomical regions r are generated as described in detail in Section 2.4.2. Again, a similarity search is conducted based on the sub-volume VQ,r to retrieve the related volume VR,r with the most hits. In this scenario, the evaluation is done for each anatomical region individually utilizing again the measure of recall. The evaluation criterion is stricter than the region-based evaluation from Section 2.4.2. In order to be considered as a True Positive, at least one of the slices from VR,r that occurred in the similarity search must actually intersect with the region r. In other words, the search system is required to localize r in the sense that at least one slide is identified where r is visible. For example, for r = ‘pancreas’ if a search retrieves a volume that indeed includes the pancreas, but the specific slices hit in the similarity search do not insect the organ, the result is marked as False Negative (FN) in the evaluation, even though the pancreas is present elsewhere in the volume (see Figure 4). The capability for a search

\ Figure 4: Localized retrieval. Anatomical regions are considered individually. A sub-volume constrained to the anatomical region of interest r is generated and fed to the search system to retrieve a volume containing the same anatomical region. A case is only considered as True Positive (TP) if at least one of the slices in the retrieved volume contains the region r.

\ system to localize an anatomical subregion of interest within a retrieved volume is particularly useful for applications with user interaction, e.g. the user marks a subregion in an image and queries the search system to retrieve similar cases from a database and localizes the corresponding subregions therein.

\ Another measure to assess the capability of the system to localize a region can be defined as the ratio of the slices that actually contain the subregion r in the retrieved volume to the total number of slices hit in the retrieved volume. In detail, the localization-ratio (LR) is defined as:

\

\ For example, the query consists of 60 slices related to region r. The table representing the top 3 volumes hit count is [48, 21, 4]. In the volume with the top hit count, 12 out of the 48 hit slices actually contain region r, indicating successful localization. The localization-ratio is then given by 12/48 = .25.

\

:::info Authors:

(1) Farnaz Khun Jush, Bayer AG, Berlin, Germany ([email protected]);

(2) Steffen Vogler, Bayer AG, Berlin, Germany ([email protected]);

(3) Tuan Truong, Bayer AG, Berlin, Germany ([email protected]);

(4) Matthias Lenga, Bayer AG, Berlin, Germany ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Verimatrix: Sale of Extended Threat Defense Assets (Mobile Application Protection) to Guardsquare

Verimatrix: Sale of Extended Threat Defense Assets (Mobile Application Protection) to Guardsquare

Completion of the sale of XTD assets (code and mobile application protection), including a portfolio of patents and a team of experts. The Group is refocusing on
Share
AI Journal2026/02/06 00:49
IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32
BlackRock boosts AI and US equity exposure in $185 billion models

BlackRock boosts AI and US equity exposure in $185 billion models

The post BlackRock boosts AI and US equity exposure in $185 billion models appeared on BitcoinEthereumNews.com. BlackRock is steering $185 billion worth of model portfolios deeper into US stocks and artificial intelligence. The decision came this week as the asset manager adjusted its entire model suite, increasing its equity allocation and dumping exposure to international developed markets. The firm now sits 2% overweight on stocks, after money moved between several of its biggest exchange-traded funds. This wasn’t a slow shuffle. Billions flowed across multiple ETFs on Tuesday as BlackRock executed the realignment. The iShares S&P 100 ETF (OEF) alone brought in $3.4 billion, the largest single-day haul in its history. The iShares Core S&P 500 ETF (IVV) collected $2.3 billion, while the iShares US Equity Factor Rotation Active ETF (DYNF) added nearly $2 billion. The rebalancing triggered swift inflows and outflows that realigned investor exposure on the back of performance data and macroeconomic outlooks. BlackRock raises equities on strong US earnings The model updates come as BlackRock backs the rally in American stocks, fueled by strong earnings and optimism around rate cuts. In an investment letter obtained by Bloomberg, the firm said US companies have delivered 11% earnings growth since the third quarter of 2024. Meanwhile, earnings across other developed markets barely touched 2%. That gap helped push the decision to drop international holdings in favor of American ones. Michael Gates, lead portfolio manager for BlackRock’s Target Allocation ETF model portfolio suite, said the US market is the only one showing consistency in sales growth, profit delivery, and revisions in analyst forecasts. “The US equity market continues to stand alone in terms of earnings delivery, sales growth and sustainable trends in analyst estimates and revisions,” Michael wrote. He added that non-US developed markets lagged far behind, especially when it came to sales. This week’s changes reflect that position. The move was made ahead of the Federal…
Share
BitcoinEthereumNews2025/09/18 01:44