A recurring challenge across science & engineering: you need to align a computationally expensive black-box simulator (PDEs, etc.) to data in order to infer hidden parameters like material coefficients or boundary conditions. In many such cases, you don't have access to gradients, adjoints, etc. If you only want point estimates, then Bayesian optimisation (BO) is an option. But if you care about the full posterior distribution, Monte Carlo or MCMC quickly become infeasible. You could fall back on Laplace approximations, but for most PDE-based inverse problems the posteriors are horrible: multimodal, non-identifiable, with tangled geometries, reflecting sensitivity scales and invariances. ABC is an option: but this typically requires huge amounts of evaluations, and has a tendency to inflate posteriors. So the homework question was: just as BO uses Gaussian Process surrogates and acquisition strategies to explore costly functions, can we design sampling strategies the same way, to approximate a posterior under a fixed compute budget? With the brilliant Takuo Matsubara, Simon Cotter, and Konstantinos Zygalakis, we introduce Bandit Importance Sampling (BIS): • A new class of importance sampling that designs samples directly via multi-armed bandits. • Combines space-filling sequences (Halton, QMC) with GP surrogates to adaptively focus where evaluations matter most. • Comes with theoretical guarantees and works well on multimodal, heavy-tailed, and real-world Bayesian inference problems. Takeaway: BIS works well: it can cut evaluations by orders of magnitude. For problems with ~10–20 parameters, it’s a very viable option. Preprint here: https://lnkd.in/egrZX_NJ Next steps: packaging this up for the community.
Innovative Sampling Concepts
Explore top LinkedIn content from expert professionals.
Summary
Innovative-sampling-concepts refer to advanced methods for selecting data samples that improve accuracy, efficiency, or quality in fields like machine learning, scientific simulations, and statistical analysis. These modern approaches go beyond basic random sampling, using smarter algorithms and adaptive strategies to tackle complex problems where traditional techniques may fall short.
- Explore adaptive sampling: Use algorithms that adjust sampling focus based on data patterns or computational feedback, which can help capture hidden or rare features more reliably.
- Combine smart algorithms: Integrate techniques like bandits, diffusion models, or importance weights to prioritize sampling in areas that matter most for your project’s goals.
- Balance speed and accuracy: Choose methods that minimize computational effort while still providing representative samples, making them practical for large-scale tasks or challenging environments.
-
-
Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight at the #ICML 2025 GenBio workshop! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools" them down ❄️ using a novel diffusion process to efficiently find stable, low-energy configurations. It's the first diffusion-based sampler that successfully scales to peptides like Alanine Dipeptide and Tripeptide in their 3D coordinate representation. With a brilliant team: Tara Akhound-Sadegh, Jungyoon Lee, Joey Bose, Valentin De Bortoli, Arnaud Doucet, Michael Bronstein, Dominique Beaini, Siamak Ravanbakhsh, Alexander Tong So, how does it work? The core challenge is that a molecule's energy landscape at a low temperature is complex, with many "valleys" (metastable states) separated by high "mountains." Traditional simulation methods can easily get stuck, failing to find all the important structures. PITA's first move is to sidestep this problem. We "heat up" the system, which flattens the energy landscape. This allows us to easily collect diverse samples using standard methods like Molecular Dynamics or MCMC. PITA starts by learning a diffusion model for this high-temperature data. Then, the math happens! We use a novel technique we call "inference-time annealing" to simulate the trained model at a lower temperature, generating a new set of samples at this lower temperature. We repeat this process, progressively "cooling" until we reach the target temperature. The results speak for themselves. Compared to standard Molecular Dynamics (MD) with the same computational budget, PITA is far better at exploring the full space of possibilities. You can see here how PITA finds all the correct molecular structures while MD completely misses one. This breakthrough has the potential to accelerate research in many areas, from designing new drugs to discovering novel materials. By making molecular sampling more efficient, we can tackle bigger and more complex scientific questions than ever before. Check out our work!👇 Paper: https://lnkd.in/epwVKxQT Code (coming soon): https://lnkd.in/ePYHA9Nd #AI #Science #ML #ComputationalChemistry
-
🚀 This innovative Soft Mining in Neural Field Training method, based on importance sampling, selectively weighs the loss of pixels rather than taking an all-or-nothing approach. By implementing Langevin Monte-Carlo sampling, this technique offers enhanced convergence and training quality, marking a significant advancement in the efficiency and accuracy of neural field training processes. Abstract: "We present an approach to accelerate Neural Field training by efficiently selecting sampling locations. While Neural Fields have recently become popular, it is often trained by uniformly sampling the training domain, or through handcrafted heuristics. We show that improved convergence and final training quality can be achieved by a soft mining technique based on importance sampling: rather than either considering or ignoring a pixel completely, we weigh the corresponding loss by a scalar. To implement our idea we use Langevin Monte-Carlo sampling. We show that by doing so, regions with higher error are being selected more frequently, leading to more than 2x improvement in convergence speed." - The University of British Columbia, Google Research, Google DeepMind, Simon Fraser University, University of Toronto Project Page: https://lnkd.in/eJbQ899K arXiv: https://lnkd.in/e_mY5b4m GitHub: https://lnkd.in/eyjz28F2 License: Apache 2.0 https://lnkd.in/e6r5GcYq For more like this ⤵ 👉 Follow Orbis Tabula #neuralnetworks #machinelearning #optimization
-
WALNUTS is the most promising Bayesian sampling algorithm to emerge in the last 5 years. I'm always interested in new samplers, but rarely as excited as this. This is especially interesting because, instead of emerging out of pure theory, it starts from a heavily road-tested algorithm (NUTS) used for many applications (including the discovery of gravitational waves), and seeks to fix one particular weakness. This is a weakness shared by all samplers out there, which affects the ubiquitous class of multilevel models. NUTS has accumulated many diagnostic methods that help us stay safe and detect when things go wrong. Most other samplers do not have that safety component. So, the combination of features could be a winner. WALNUTS is working its way through into Stan and PyMC. Until then, you can take a look at the C++ here: https://lnkd.in/ezjcgbn4 , or you can read the preprint here: https://lnkd.in/e9SC67WQ if you prefer 39 pages of Hard Maths. I would advise any software developers in the Bayesian space to take a serious look at WALNUTS for their own products. We can't continue into the 21st century with most Bayesian analysis in research still stumbling along in random walk Metropolis-Hastings and Gibbs samplers, yet there are still multilevel models where NUTS doesn't offer a perfect solution either.
-
Let us discuss a very famous interview problem which can be widely seen in a LOT of technical interviews - whether it be quant, coding or machine learning - 𝗛𝗼𝘄 𝘄𝗶𝗹𝗹 𝘆𝗼𝘂 𝘀𝗲𝗹𝗲𝗰𝘁 𝗮 𝘀𝘂𝗯𝘀𝗲𝘁 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗮𝗿𝗴𝗲𝗿 𝘀𝗲𝘁 𝗿𝗮𝗻𝗱𝗼𝗺𝗹𝘆? 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗥𝗮𝗻𝗱𝗼𝗺 𝗦𝗮𝗺𝗽𝗹𝗶𝗻𝗴? Random sampling involves selecting a subset of elements from a larger set, where each possible subset has an equal probability of being chosen. This technique is crucial for real-world applications like A/B testing, statistical analysis, and feature rollouts where we need unbiased, representative samples. 𝗧𝗵𝗲 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝗶𝗰 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 The most efficient approach to generate a random subset of size k from an array of n elements is to use an in-place algorithm that builds the solution incrementally: For each position i from 0 to k-1: • Generate a random index j between i and n-1 (inclusive) • Swap the elements at positions i and j This elegant algorithm, known as the 𝗙𝗶𝘀𝗵𝗲𝗿-𝗬𝗮𝘁𝗲𝘀 𝘀𝗵𝘂𝗳𝗳𝗹𝗲 (or Knuth shuffle) when applied to the entire array, has several key properties: Time Complexity: O(k) where k is the sample size Space Complexity: O(1) additional space (in-place) Uniformity: Every possible subset of size k has exactly the same probability of being selected The mathematical principle behind this algorithm ensures unbiased selection through a clever observation: if we have a random subset of size i, we can construct a random subset of size i+1 by randomly selecting one more element from the remaining n-i elements. For large datasets where we can't store all elements in memory, we can use Reservoir Sampling, which maintains a "reservoir" of k items and processes the stream in a single pass. 𝗦𝗶𝗺𝗶𝗹𝗮𝗿 𝗣𝗿𝗼𝗯𝗹𝗲𝗺𝘀 𝘁𝗼 𝗠𝗮𝘀𝘁𝗲𝗿 Fisher-Yates Shuffle Implementation Shuffle an Array - https://lnkd.in/dxhkVug5 Shuffle the Array - https://lnkd.in/ddKuJVEi Reservoir Sampling Applications Random Pick Index - https://lnkd.in/di_rjP9K Linked List Random Node - https://lnkd.in/d75M3uED Random Pick with Blacklist - https://lnkd.in/dCydC25f Advanced Random Selection Random Flip Matrix - https://lnkd.in/dyFxXAGh Weighted Random Selection - Generate random size-k subset from probability-weighted set Why This Matters: Random sampling algorithms are fundamental for machine learning (creating training/test splits), distributed systems (load balancing), and feature deployment (staged rollouts). The ability to efficiently generate unbiased random samples is a skill that distinguishes great engineers from good ones. Remember that the key insight in these problems is understanding probability distributions and how to maintain uniformity while minimizing both time and space complexity.
-
Fieldwork consistently provides invaluable insights, practicality, and a grounded understanding of reality. My goal has always been to develop approaches tailored to local needs, considering affordability and technical capacity. With this mindset, I pioneered a novel method by combining distance sampling, 3D technology, and manual sampling. This innovative approach enables the assessment of municipal solid waste collection efficiency, waste quantity on the street, and waste composition at a neighborhood scale, independently of municipal waste management authorities. This approach is particularly beneficial for regions with limited resources, helping to identify areas needing improved waste management and assess the effectiveness of existing infrastructure. Over nearly four months of fieldwork, finding researchers or assistants willing to handle street waste proved challenging. Undeterred, I personally roamed multiple neighborhoods, handling and measuring waste composition and quantity firsthand. With the support of a friend of the security guard at my hotel, this hands-on experience underscored the importance of perseverance and collaboration in overcoming obstacles and achieving meaningful results. https://lnkd.in/eBZv-UQE