new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Mar 4

AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent

There is a growing demand for mobile user interface (UI) automation, driven by its broad applications across industries. With the advent of visual language models (VLMs), GUI automation has progressed from generating text-based instructions for humans to autonomously executing tasks, thus optimizing automation workflows. Recent approaches leverage VLMs for this problem due to their ability to 1) process on-screen content directly, 2) remain independent of device-specific APIs by utilizing human actions (e.g., clicks, typing), and 3) apply real-world contextual knowledge for task understanding. However, these models often have trouble accurately identifying widgets and determining actions due to limited spatial information in vision encoder features. Additionally, top-performing models are often large, requiring extensive training and resulting in inference delays. In this work, we introduce AFRAgent, an instruct-BLIP-based multimodal architecture that achieves superior performance in GUI automation while being less than one-fourth the size of its nearest competitor. To enhance image embeddings in the large language model (LLM) pipeline, we propose an adaptive feature renormalization-based (a token-level affine transformation) technique that effectively enriches low-resolution image embeddings and fuses high-resolution details. We evaluate AFRAgent on Meta-GUI and AITW benchmarks, establishing a new state-of-the-art baseline for smartphone automation.

  • 5 authors
·
Nov 30, 2025

Causality and Renormalization in Finite-Time-Path Out-of-Equilibrium $φ^3$ QFT

Our aim is to contribute to quantum field theory (QFT) formalisms useful for descriptions of short time phenomena, dominant especially in heavy ion collisions. We formulate out-of-equilibrium QFT within the finite-time-path formalism (FTP) and renormalization theory (RT). The potential conflict of FTP and RT is investigated in g phi^3 QFT, by using the retarded/advanced (R/A) basis of Green functions and dimensional renormalization (DR). For example, vertices immediately after (in time) divergent self-energy loops do not conserve energy, as integrals diverge. We "repair" them, while keeping d<4, to obtain energy conservation at those vertices. Already in the S-matrix theory, the renormalized, finite part of Feynman self-energy Sigma_{F}(p_0) does not vanish when |p_0|rightarrowinfty and cannot be split to retarded and advanced parts. In the Glaser--Epstein approach, the causality is repaired in the composite object G_F(p_0)Sigma_{F}(p_0). In the FTP approach, after repairing the vertices, the corresponding composite objects are G_R(p_0)Sigma_{R}(p_0) and Sigma_{A}(p_0)G_A(p_0). In the limit drightarrow 4, one obtains causal QFT. The tadpole contribution splits into diverging and finite parts. The diverging, constant component is eliminated by the renormalization condition langle 0|phi|0rangle =0 of the S-matrix theory. The finite, oscillating energy-nonconserving tadpole contributions vanish in the limit trightarrow infty .

  • 2 authors
·
Dec 31, 2019

Combining Electron-Phonon and Dynamical Mean-Field Theory Calculations of Correlated Materials: Transport in the Correlated Metal Sr$_2$RuO$_4$

Electron-electron (e-e) and electron-phonon (e-ph) interactions are challenging to describe in correlated materials, where their joint effects govern unconventional transport, phase transitions, and superconductivity. Here we combine first-principles e-ph calculations with dynamical mean field theory (DMFT) as a step toward a unified description of e-e and e-ph interactions in correlated materials. We compute the e-ph self-energy using the DMFT electron Green's function, and combine it with the e-e self-energy from DMFT to obtain a Green's function including both interactions. This approach captures the renormalization of quasiparticle dispersion and spectral weight on equal footing. Using our method, we study the e-ph and e-e contributions to the resistivity and spectral functions in the correlated metal Sr_2RuO_4. In this material, our results show that e-e interactions dominate transport and spectral broadening in the temperature range we study (50-310~K), while e-ph interactions are relatively weak and account for only sim10\% of the experimental resistivity. We also compute effective scattering rates, and find that the e-e interactions result in scattering several times greater than the Planckian value k_BT, whereas e-ph interactions are associated with scattering rates lower than k_BT. Our work demonstrates a first-principles approach to combine electron dynamical correlations from DMFT with e-ph interactions in a consistent way, advancing quantitative studies of correlated materials.

  • 5 authors
·
Apr 13, 2023

Precision holography for non-conformal branes

We set up precision holography for the non-conformal branes preserving 16 supersymmetries. The near-horizon limit of all such p-brane solutions with p \leq 4, including the case of fundamental string solutions, is conformal to AdS_{p+2} x S^{8-p} with a linear dilaton. We develop holographic renormalization for all these cases. In particular, we obtain the most general asymptotic solutions with appropriate Dirichlet boundary conditions, find the corresponding counterterms and compute the holographic 1-point functions, all in complete generality and at the full non-linear level. The result for the stress energy tensor properly defines the notion of mass for backgrounds with such asymptotics. The analysis is done both in the original formulation of the method and also using a radial Hamiltonian analysis. The latter formulation exhibits most clearly the existence of an underlying generalized conformal structure. In the cases of Dp-branes, the corresponding dual boundary theory, the maximally supersymmetric Yang-Mills theory SYM_{p+1}, indeed exhibits the generalized conformal structure found at strong coupling. We compute the holographic 2-point functions of the stress energy tensor and gluon operator and show they satisfy the expected Ward identities and the constraints of generalized conformal structure. The holographic results are also manifestly compatible with the M-theory uplift, with the asymptotic solutions, counterterms, one and two point functions etc of the IIA F1 and D4 appropriately descending from those of M2 and M5 branes, respectively. We present a few applications including the computation of condensates in Witten's model of holographic YM_4 theory.

  • 3 authors
·
Jul 21, 2008

On the Road to Clarity: Exploring Explainable AI for World Models in a Driver Assistance System

In Autonomous Driving (AD) transparency and safety are paramount, as mistakes are costly. However, neural networks used in AD systems are generally considered black boxes. As a countermeasure, we have methods of explainable AI (XAI), such as feature relevance estimation and dimensionality reduction. Coarse graining techniques can also help reduce dimensionality and find interpretable global patterns. A specific coarse graining method is Renormalization Groups from statistical physics. It has previously been applied to Restricted Boltzmann Machines (RBMs) to interpret unsupervised learning. We refine this technique by building a transparent backbone model for convolutional variational autoencoders (VAE) that allows mapping latent values to input features and has performance comparable to trained black box VAEs. Moreover, we propose a custom feature map visualization technique to analyze the internal convolutional layers in the VAE to explain internal causes of poor reconstruction that may lead to dangerous traffic scenarios in AD applications. In a second key contribution, we propose explanation and evaluation techniques for the internal dynamics and feature relevance of prediction networks. We test a long short-term memory (LSTM) network in the computer vision domain to evaluate the predictability and in future applications potentially safety of prediction models. We showcase our methods by analyzing a VAE-LSTM world model that predicts pedestrian perception in an urban traffic situation.

  • 6 authors
·
Apr 26, 2024

SETOL: A Semi-Empirical Theory of (Deep) Learning

We present a SemiEmpirical Theory of Learning (SETOL) that explains the remarkable performance of State-Of-The-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR): the heavy-tailed power-law layer quality metrics, alpha and alpha-hat. In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, importantly, without needing access to either testing or training data. Our SETOL uses techniques from statistical mechanics as well as advanced methods from random matrix theory and quantum chemistry. The derivation suggests new mathematical preconditions for ideal learning, including a new metric, ERG, which is equivalent to applying a single step of the Wilson Exact Renormalization Group. We test the assumptions and predictions of SETOL on a simple 3-layer multilayer perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer qualities of a trained NN by simply computing the empirical spectral density (ESD) of the layer weight matrices and plugging this ESD into our SETOL formulas. Notably, we examine the performance of the HTSR alpha and the SETOL ERG layer quality metrics, and find that they align remarkably well, both on our MLP and on SOTA NNs.

  • 2 authors
·
Jul 23, 2025

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter theory (CMT) at the level of an expert researcher. Topics span analytical and computational approaches in quantum many-body, and classical statistical mechanics. The dataset was designed and verified by a panel of expert researchers from around the world. We built the dataset through a collaborative environment that challenges the panel to write and refine problems they would want a research assistant to solve, including Hartree-Fock, exact diagonalization, quantum/variational Monte Carlo, density matrix renormalization group (DMRG), quantum/classical statistical mechanics, and model building. We evaluate LLMs by programmatically checking solutions against expert-supplied ground truth. We developed machine-grading, including symbolic handling of non-commuting operators via normal ordering. They generalize across tasks too. Our evaluations show that frontier models struggle with all of the problems in the dataset, highlighting a gap in the physical reasoning skills of current LLMs. Notably, experts identified strategies for creating increasingly difficult problems by interacting with the LLMs and exploiting common failure modes. The best model, GPT5, solves 30\% of the problems; average across 17 models (GPT, Gemini, Claude, DeepSeek, Llama) is 11.4pm2.1\%. Moreover, 18 problems are solved by none of the 17 models, and 26 by at most one. These unsolved problems span Quantum Monte Carlo, Variational Monte Carlo, and DMRG. Answers sometimes violate fundamental symmetries or have unphysical scaling dimensions. We believe this benchmark will guide development toward capable AI research assistants and tutors.

  • 19 authors
·
Oct 6, 2025