{"id":17391,"date":"2026-04-07T15:00:32","date_gmt":"2026-04-07T18:00:32","guid":{"rendered":"https:\/\/rtmedical.com.br\/voxtell-varian-eclipse-esapi-radiotherapy-research\/"},"modified":"2026-04-07T15:00:32","modified_gmt":"2026-04-07T18:00:32","slug":"voxtell-varian-eclipse-esapi-radiotherapy-research","status":"publish","type":"post","link":"https:\/\/rtmedical.com.br\/en\/voxtell-varian-eclipse-esapi-radiotherapy-research\/","title":{"rendered":"VoxTell and Varian Eclipse ESAPI: AI in Radiotherapy Research"},"content":{"rendered":"<p><strong>VoxTell<\/strong> is a 3D vision-language model that segments structures in volumetric medical images from free-text prompts. Trained on over 62,000 CT, MRI, and PET volumes covering more than a thousand anatomical and pathological classes, the model represents a concrete advance in automatic segmentation. Gustavo Gomes Formento, a researcher at RT Medical Systems, developed two open-source integrations that connect VoxTell to interactive web interfaces and to the <strong>Varian Eclipse ESAPI<\/strong>, creating research prototypes that bring academic models closer to the real radiotherapy workflow.<\/p>\n<p>This article details the model architecture, both public integrations \u2014 web and ESAPI \u2014 and the DICOM coordinate conversion pipeline that makes it all possible. All content refers exclusively to research and technical evaluation tools, <strong>never to clinical software<\/strong>.<\/p>\n<h2>What VoxTell Changes in 3D Segmentation<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"alignright lazyload\" data-src=\"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/voxtell-architecture-overview.png\" alt=\"VoxTell architecture diagram showing 3D image encoder, Qwen3 text encoder, and multi-scale fusion decoder\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1405px; --smush-placeholder-aspect-ratio: 1405\/442;\"><figcaption>VoxTell architecture based on the project&#8217;s public materials.<\/figcaption><\/figure>\n<p>Conventional segmentation models operate with fixed labels. If the model was not trained for &#8220;posterior fossa tumor&#8221;, it simply does not segment it. VoxTell replaces this paradigm with free-text prompts: the operator types the desired structure \u2014 from &#8220;liver&#8221; to &#8220;left kidney with cortical cyst&#8221; \u2014 and the model generates the corresponding volumetric mask.<\/p>\n<p>The architecture combines a 3D image encoder with <strong>Qwen3-Embedding-4B<\/strong> as a frozen text encoder. A prompt decoder transforms text queries and latent image representations into multi-scale text features. The image decoder fuses visual and textual information at multiple resolutions using MaskFormer-style query-image fusion with deep supervision. The result: zero-shot segmentation with state-of-the-art performance on familiar structures and reasonable generalization to never-seen classes.<\/p>\n<p>The original paper (arXiv:2511.11450) documents training on 158 public datasets covering brain, head and neck, thorax, abdomen, pelvis, and musculoskeletal system \u2014 including vascular structures, organ sub-structures, and lesions. A foundation that <a href=\"https:\/\/rtmedical.com.br\/ia-radiologia-workflow-integracao\/\">reflects the migration of AI from isolated algorithms to workflow integration<\/a>.<\/p>\n<h2>Web Interface: 3D Viewer, RTStruct, and Engineering for Limited GPU<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"alignleft lazyload\" data-src=\"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/voxtell-web-interface-scaled.png\" alt=\"VoxTell web interface with interactive NiiVue 3D visualization, volume upload, and text-prompt segmentation\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 2560px; --smush-placeholder-aspect-ratio: 2560\/1603;\"><figcaption>VoxTell web interface with interactive viewer and text-based segmentation workflow.<\/figcaption><\/figure>\n<p>Developed by Gustavo Gomes Formento (RT Medical Systems), the <strong>voxtell-web-plugin<\/strong> is a FastAPI + React\/TypeScript application that puts the model behind an accessible interface. The operator uploads a volume (.nii, .nii.gz or DICOM), types a prompt such as &#8220;liver&#8221; or &#8220;prostate tumor&#8221;, and receives the 3D mask overlaid on the NiiVue viewer in real time.<\/p>\n<p>Low VRAM engineering is the practical differentiator. The Qwen3-Embedding-4B text encoder runs in float16, reducing memory usage from ~15 GB to ~7.5 GB. The memory allocator uses <code>expandable_segments=True<\/code> to reduce fragmentation, and the sliding window operates with <code>perform_everything_on_device=False<\/code> for partial CPU offload. As a result, 12 GB GPUs can already run inference \u2014 hardware found in research workstations, not just clusters.<\/p>\n<p>The viewer supports accumulation of multiple segmentations (liver + spleen + kidneys in the same session), manual drawing for refinement, and export in <strong>NIfTI and RTStruct<\/strong>. RTStruct export is particularly relevant: it produces a DICOM-RT file that can be imported into planning systems for comparative evaluation \u2014 always in a research context.<\/p>\n<p><strong>Orientation note:<\/strong> images must be in RAS orientation for correct left\/right anatomical localization. Orientation mismatches produce mirrored or incorrect results. PyTorch 2.9.0 has an OOM bug in 3D convolutions; the recommended version is 2.8.0 or earlier.<\/p>\n<h2>Varian Eclipse ESAPI: How the Integration Works<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"alignright lazyload\" data-src=\"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/voxtell-concepts-promptable-segmentation.png\" alt=\"Conceptual diagram of VoxTell-ESAPI showing text-prompt segmentation scenarios in Varian Eclipse\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1200px; --smush-placeholder-aspect-ratio: 1200\/1200;\"><figcaption>Public visual of promptable segmentation scenarios in the Eclipse context.<\/figcaption><\/figure>\n<p>Also created by Gustavo (RT Medical Systems), the <strong>VoxTell-ESAPI<\/strong> adds two components to the ecosystem: a Python\/FastAPI server that receives CT data via HTTP and runs inference on GPU, and a C# ESAPI plugin that extracts CT from Varian Eclipse, sends it to the server, and re-imports the resulting contours as RT structures.<\/p>\n<p>The complete workflow works as follows:<\/p>\n<ol>\n<li>The operator opens a patient in Eclipse with CT and an existing structure set<\/li>\n<li>The plugin creates a session on the server, sending volume geometry (origin, row\/column\/slice direction, spacing)<\/li>\n<li>For each Z slice, voxels are extracted as <code>ushort[xSize, ySize]<\/code>, converted to int32, serialized in little-endian, compressed with gzip, and encoded in base64 \u2014 reducing payload by ~4\u00d7<\/li>\n<li>After all slices are sent, the server assembles the NIfTI volume with LPS\u2192RAS conversion<\/li>\n<li>The operator types prompts (e.g., &#8220;liver, left kidney, spleen&#8221;) and submits<\/li>\n<li>Inference runs asynchronously \u2014 Eclipse does not freeze<\/li>\n<li>The server extracts 2D contours from the masks and returns coordinates in LPS (patient space)<\/li>\n<li>The plugin imports via <code>structure.AddContourOnImagePlane(contour_points_lps, z_index)<\/code><\/li>\n<\/ol>\n<p>Existing structures are matched by name (exact, case-insensitive, or fuzzy). Structures not found are auto-created with DICOM type CONTROL. Names are sanitized to 16 characters (e.g., &#8220;left kidney&#8221; \u2192 &#8220;left_kidney&#8221;).<\/p>\n<p><strong>This plugin is intended exclusively for non-clinical environments: ECNC (External Calculation and Non-Clinical) and Varian TBOX (training box).<\/strong> It must never be run in a clinical environment.<\/p>\n<h2>DICOM Conversion Pipeline: LPS, RAS, and the Mathematics of Coordinates<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"alignleft lazyload\" data-src=\"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/ct-scanner-radiotherapy-workstation.jpeg\" alt=\"Workstation with CT scanner in a hospital radiotherapy environment\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1880px; --smush-placeholder-aspect-ratio: 1880\/1253;\"><figcaption>Photo: MART PRODUCTION \/ Pexels<\/figcaption><\/figure>\n<p>The coordinate conversion between DICOM (LPS) and NIfTI (RAS) is the most critical technical point of the entire integration. An error at this stage produces mirrored volumes, anteroposterior inverted contours, or structures on the wrong side of the patient. The pipeline implements the transformation rigorously.<\/p>\n<h3>DICOM Geometry \u2192 LPS Affine<\/h3>\n<p>Eclipse exposes the image geometry (origin, row direction, column direction, slice direction, spacing). The server builds the 4\u00d74 affine matrix that maps voxel indices to millimeter positions in the DICOM LPS (Left, Posterior, Superior) system:<\/p>\n<p>$$x_{LPS} = A_{LPS} \\begin{bmatrix} i \\\\ j \\\\ k \\\\ 1 \\end{bmatrix}$$<\/p>\n<p>Where the columns of $A_{LPS}$ are formed by:<\/p>\n<ul>\n<li>Column 0: <code>row_direction \u00d7 x_res<\/code> (column axis +X)<\/li>\n<li>Column 1: <code>col_direction \u00d7 y_res<\/code> (row axis +Y)<\/li>\n<li>Column 2: <code>slice_direction \u00d7 z_res<\/code> (slice axis +Z)<\/li>\n<li>Column 3: <code>origin<\/code> (position of voxel 0,0,0)<\/li>\n<\/ul>\n<h3>LPS \u2192 RAS Conversion<\/h3>\n<p>DICOM and NIfTI use opposite conventions on the first two axes:<\/p>\n<table>\n<thead>\n<tr>\n<th>System<\/th>\n<th>X<\/th>\n<th>Y<\/th>\n<th>Z<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DICOM\/Eclipse (LPS)<\/td>\n<td>Patient Left<\/td>\n<td>Patient Posterior<\/td>\n<td>Patient Superior<\/td>\n<\/tr>\n<tr>\n<td>NIfTI\/VoxTell (RAS)<\/td>\n<td>Patient Right<\/td>\n<td>Patient Anterior<\/td>\n<td>Patient Superior<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The transformation requires inverting the first two axes:<\/p>\n<p>$$A_{RAS} = \\operatorname{diag}(-1,-1,1,1) \\cdot A_{LPS}$$<\/p>\n<p>In the code, the volume is transposed from (Z,Y,X) to (X,Y,Z) for the NIfTI convention, and the X and Y axes of the affine are inverted. A naive copy produces a mirrored and anteroposterior-inverted volume \u2014 exactly the kind of error that only appears during rigorous clinical review, not in automated tests.<\/p>\n<h3>Return: RAS Masks \u2192 LPS Contours<\/h3>\n<p>After inference, the inverse path uses <code>find_contours<\/code> from scikit-image to extract 2D contour lines on each slice, and projects voxel indices back to LPS millimeters using the affine stored in the session:<\/p>\n<p>$$\\text{pts}_{LPS} = (\\text{vox\\_coords} \\cdot A_{LPS}^T)[:, :3]$$<\/p>\n<p>The points are sent to Eclipse, which applies them directly via <code>AddContourOnImagePlane()<\/code>.<\/p>\n<h3>Evaluation Metrics<\/h3>\n<p>To evaluate segmentation quality, two metrics are standard:<\/p>\n<p>The Dice coefficient measures the overlap between predicted segmentation $X$ and reference $Y$:<\/p>\n<p>$$DSC(X,Y) = \\frac{2|X \\cap Y|}{|X| + |Y|}$$<\/p>\n<p>The Hausdorff distance measures the worst pointwise divergence between surfaces:<\/p>\n<p>$$HD(X,Y) = \\max\\left\\{\\sup_{x \\in X}\\inf_{y \\in Y} d(x,y),\\; \\sup_{y \\in Y}\\inf_{x \\in X} d(x,y)\\right\\}$$<\/p>\n<h2>Research Plugins, SaMD Boundaries, and Why Regulatory Language Matters<\/h2>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"alignright lazyload\" data-src=\"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/doctor-reviewing-medical-imaging.jpeg\" alt=\"Healthcare professional reviewing regulatory documentation on a computer\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1880px; --smush-placeholder-aspect-ratio: 1880\/1253;\"><figcaption>Photo: SHVETS production \/ Pexels<\/figcaption><\/figure>\n<p>The medical software market operates under strict regulation. Any software that influences diagnostic or therapeutic decisions may be classified as <strong>Software as a Medical Device (SaMD)<\/strong>, subject to frameworks such as IEC 62304, ISO 14971, IMDRF, and regulations from agencies including FDA, ANVISA, and CE Marking.<\/p>\n<p>The plugins described in this article \u2014 web and ESAPI \u2014 are <strong>research, experimentation, prototyping, and technical evaluation tools<\/strong>. Specifically:<\/p>\n<ul>\n<li>The original VoxTell model is the work of the research group cited in the paper (Rokuss et al., 2025), not RT Medical Systems<\/li>\n<li>Gustavo Gomes Formento, a researcher at RT Medical Systems, is the author of the open-source integrations (web interface and ESAPI plugin) published around VoxTell<\/li>\n<li>The ESAPI plugin is intended exclusively for ECNC and Varian TBOX \u2014 non-clinical environments<\/li>\n<li>These plugins <strong>must never be used clinically<\/strong><\/li>\n<li>They are not approved, released, validated, or authorized medical software by any regulatory agency<\/li>\n<li>There is no formal endorsement from Varian, DKFZ, MIC-DKFZ, or the original paper authors<\/li>\n<\/ul>\n<p>Clinical use of any AI-assisted segmentation tool would require independent validation, a quality management system, risk analysis (ISO 14971), a cybersecurity process, and full regulatory evaluation. These are not formalities \u2014 they are the barriers that separate research prototypes from devices that influence patient treatment.<\/p>\n<p>For professionals working with <a href=\"https:\/\/rtmedical.com.br\/aplicacoes-dicom-desenvolvimento-software\/\">DICOM software development<\/a> or with <a href=\"https:\/\/rtmedical.com.br\/implementacao-dicom-faq-troubleshooting\/\">DICOM infrastructure implementation and troubleshooting<\/a>, understanding this boundary is essential before evaluating any AI tool.<\/p>\n<h2>Integration Engineering: What Radiotherapy Demands from Software<\/h2>\n<p>The technical value of these integrations does not lie in the model itself \u2014 segmentation models emerge every quarter. The value lies in demonstrating the engineering competencies that any radiotherapy software company must master:<\/p>\n<ul>\n<li><strong>DICOM interoperability:<\/strong> bidirectional format conversion (NIfTI \u2194 DICOM), affine handling and volume orientation, RTStruct export<\/li>\n<li><strong>TPS integration:<\/strong> communication via ESAPI, voxel serialization, contour import in patient coordinates<\/li>\n<li><strong>Resource optimization:<\/strong> inference on consumer GPU, CPU offload, payload compression<\/li>\n<li><strong>Asynchronous workflow:<\/strong> TTL sessions, polling without blocking UI, cancellation and cleanup<\/li>\n<li><strong>Governance:<\/strong> clear separation between research and clinical product, precise regulatory language<\/li>\n<\/ul>\n<p>Each of these is a real requirement in projects like <a href=\"https:\/\/rtmedical.com.br\/rtconnect\/\">RTConnect<\/a> and contour review pipelines \u2014 not theoretical exercises, but problems that arise in every integration with real equipment and planning systems. <a href=\"https:\/\/rtmedical.com.br\/a-importancia-da-padronizacao-das-estruturas-em-radioterapia-tg-263\/\">Structure standardization according to TG-263<\/a> is another direct convergence point.<\/p>\n<h2>Next Steps and Context for Teams<\/h2>\n<p>VoxTell&#8217;s public roadmap indicates that fine-tuning support has not yet been released. When available, it will open up the possibility of adapting the model to specific structures of interest \u2014 for example, head and neck OAR structures according to institutional protocols \u2014 again in a research context.<\/p>\n<p>If your team is evaluating AI-assisted contouring workflows, validation pipelines, or review and governance layers around segmentation, RT Medical Systems can help structure that conversation.<\/p>\n<p><em>All technical information in this article was extracted from public sources: the VoxTell paper (arXiv:2511.11450, Rokuss et al., 2025) and the GitHub repositories gomesgustavoo\/voxtell-web-plugin and gomesgustavoo\/VoxTell-ESAPI.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>VoxTell segments 3D medical images from free-text prompts. Open-source integrations by an RT Medical Systems researcher connect the model to Varian Eclipse ESAPI for radiotherapy research.<\/p>\n","protected":false},"author":1,"featured_media":17366,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[102,99,230],"tags":[],"class_list":{"0":"post-17391","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai","8":"category-radiotherapy","9":"category-software-en"},"aioseo_notices":[],"rt_seo":{"title":"VoxTell and Varian Eclipse ESAPI: AI in Radiotherapy Research","description":"VoxTell segments 3D medical images using free-text prompts. Open-source integrations by Gustavo Gomes Formento (RT Medical Systems) connect VoxTell to Varian Eclipse ESAPI for radiotherapy research.","canonical":"","og_image":"https:\/\/rtmedical.com.br\/wp-content\/uploads\/2026\/04\/voxtell-web-interface-scaled.png","robots":"index,follow","schema_type":"Article","include_in_llms":true,"llms_label":"VoxTell ESAPI Radiotherapy Research","llms_summary":"Technical overview of VoxTell 3D vision-language segmentation model and its open source integrations with web viewer and Varian Eclipse ESAPI for radiotherapy research. Covers architecture, DICOM coordinate conversion pipeline (LPS\/RAS), and regulatory boundaries. Integrations authored by Gustavo Gomes Formento, researcher at RT Medical Systems.","faq_items":[],"video":[],"gtin":"","mpn":"","brand":"","aggregate_rating":[]},"_links":{"self":[{"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/posts\/17391\/"}],"collection":[{"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/posts\/"}],"about":[{"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/types\/post\/"}],"author":[{"embeddable":true,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/users\/1\/"}],"replies":[{"embeddable":true,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/comments\/?post=17391"}],"version-history":[{"count":0,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/posts\/17391\/revisions\/"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/media\/17366\/"}],"wp:attachment":[{"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/media\/?parent=17391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/categories\/?post=17391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rtmedical.com.br\/en\/wp-json\/wp\/v2\/tags\/?post=17391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}