Aquileo | DeLLMa Decision Making under Uncertainty with LLMs

AI has moved beyond text generation LLMs are now being used for decision-making. Models like ChatGPT, Claude and Gemini focused on conversation and creativity. But LLMs can analyze data, understand context and choose optimal actions, enabling intelligent automation in domains like finance, gaming and business strategy.

DeLLMa stands for Decision-Making under Uncertainty with Large Language Models. Here, uncertainty refers to situations where situations where outcomes are unpredictable or data is incomplete.

It allows an LLM to:

Evaluate multiple possible actions,
Analyze uncertain outcomes,
And choose the action that maximizes the expected utility (i.e., the best decision considering both risk and reward)

Core Idea

DeLLMa introducing a structured reasoning framework, inspired by classical decision theory.

Decision problem represent using :

P = ( G, A, C)

where:

G (Goal): The user’s objective (e.g., maximize revenue, minimize loss)
A (Actions): The possible options or choices (e.g., apples, avocados, pears)
C (Context): The relevant background information (e.g., weather data, historical yields or market trends)

LLMs in Decision-Making

LLMs analyze structured or unstructured data to extract key insights and understand context.
They apply logical reasoning, detect patterns and compare outcomes to predict results.
LLMs create multiple solution paths and simulate “what-if” scenarios for better evaluation.
The model selects the most optimal decision based on learned objectives and data insights.
Integrated LLMs can autonomously use tools or APIs to execute decisions in real time.
They assist humans by ranking options, summarizing insights and explaining reasoning clearly.
Advanced setups enable LLMs to assess their own decisions and improve future performance.

How it Works

DeLLMa uses a multi-step reasoning pipeline that closely follows how humans think through uncertain choices.

Identify Uncertainty: The model starts by identifying all unknown factors (e.g., weather conditions, market changes or user behavior) that could influence the decision.
Forecast Probabilities: It then estimates the likelihood of different outcomes using contextual information.
Elicit Utility Function: It defines a utility function to align AI decisions with user preferences, such as safety or profit.
Maximize Expected Utility: Finally, DeLLMa combines the probabilities and utilities to calculate the expected utility for each action and selects the one with the highest expected benefit.

Step-By-Step Implementation

Step1: Import Required Libraries

import json: Handles reading and writing JSON data.
import os Interacts with the operating system .
import random: Generates random numbers or selections.
import argparse: Parses command-line arguments.
import dataclass, field: Simplifies data class creation.
import Dict, List, Tuple, Optional: Adds type hints for clarity.
import numpy as np: Numerical operations.

Python

import json
import os
import random
import argparse
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional
import numpy as np

Step2 : Create a Mock LLM

class MockLLM: Creates a mock language model for testing.
generate(): Generates a fake response based on the given prompt.

Python

class MockLLM:
    belief_phrases = ["very likely", "likely", "somewhat likely", "somewhat unlikely", "unlikely", "very unlikely"]

    @staticmethod
    def generate(prompt: str, system: Optional[str] = None, temperature: float = 0.0) -> str:
     
        if "produce a belief distribution" in prompt.lower() or "belief distribution" in prompt.lower():
            random.seed(len(prompt))
            state_lines = [line.strip().lstrip("- ").split(":")[0] for line in prompt.splitlines() if line.strip().startswith("- ")]
            result = {}
            for s in state_lines:
                options = [f"{s}_high", f"{s}_medium", f"{s}_low"]
                choices = random.sample(MockLLM.belief_phrases, 3)
                result[s] = {options[i]: choices[i] for i in range(3)}
            return json.dumps(result)
        elif "state-action pair" in prompt.lower() or "state-action pairs" in prompt.lower():
            lines = [l.strip() for l in prompt.splitlines() if l.strip().startswith("Action") or l.strip().startswith("- State-Action Pair")]
            scores = []
            idx = 0
            for l in lines:
                if "State-Action Pair" in l:
                    score = 0
                    if "avocado" in l:
                        score += 5
                    if "apple" in l:
                        score += 2
                    if "drought" in l:
                        score -= 1
                    scores.append((idx + 1, score))
                    idx += 1
            
            scores_sorted = sorted(scores, key=lambda x: -x[1])
            
            rank = [s[0] for s in scores_sorted]
            response = {"decision": f"State-Action Pair {rank[0]}", "rank": rank, "explanation": "Mocked ranking based on fruit preference heuristics."}
            return json.dumps(response)
        else:
           
            actions = [l.strip() for l in prompt.splitlines() if l.strip().startswith("Action")]
     
            chosen = None
            for a in actions:
                if "avocado" in a:
                    chosen = a
                    break
            if chosen is None and actions:
                chosen = random.choice(actions)
            response = {"decision": chosen or "Action 1", "explanation": "Mock decision: prefer avocado if available."}
            return json.dumps(response)

Step 3: Define Configuration Data Classes

ActionConfig: Defines action settings like mode (action_enum_mode) and budget limit.
StateConfig: Stores state settings such as mode, state list and top-k selection limit.
PreferenceConfig: Holds sampling settings mode, sample size, minibatch size and overlap.

Python

@dataclass
class ActionConfig:
    action_enum_mode: str = "base"
    budget: int = 10

@dataclass
class StateConfig:
    state_enum_mode: str = "sequential"
    states: Dict[str, str] = field(default_factory=dict)
    topk: int = 3

@dataclass
class PreferenceConfig:
    pref_enum_mode: str = "base"
    sample_size: int = 16
    minibatch_size: int = 8
    overlap_pct: float = 0.25

Step 4: Implement the DeLLMaAgent Class

This class handles all reasoning, sampling and decision-making logic

1. Initialization

DeLLMaAgent: AI agent that simulates decision-making for farm planning.
belief2score: Converts belief phrases into numeric scores for evaluation.
prepare_context(): Creates a text prompt describing crop options with yield and price info.

Python

class DeLLMaAgent:
    belief2score = {
        "very likely": 6, "likely": 5, "somewhat likely": 4,
        "somewhat unlikely": 3, "unlikely": 2, "very unlikely": 1
    }

    def __init__(self, choices: List[str], year: str = "2021",
                 action_config: ActionConfig = ActionConfig(),
                 state_config: StateConfig = StateConfig(),
                 preference_config: PreferenceConfig = PreferenceConfig(),
                 llm = MockLLM):
        self.choices = sorted(set(choices))
        self.year = year
        self.action_config = action_config
        self.state_config = state_config
        self.preference_config = preference_config
        self.llm = llm
        self.unit = "acres"
        self.product = "fruit"
        self.stats = {c: {"yield": random.uniform(10, 30), "price": random.uniform(1.0, 3.0)} for c in self.choices}

    def prepare_context(self) -> str:
        ctx = f"You are an agricultural expert helping a farmer choose what to plant next year.\n"
        for c in self.choices:
            ctx += f"- {c}: yield={self.stats[c]['yield']:.2f}, price=${self.stats[c]['price']:.2f}\n"
        return ctx

2. Prepare Actions and States

prepare_actions(): Creates planting actions for each fruit based on the available budget.
self.actions: Stores each fruit with its assigned budget as an action plan.
self.action_strs: Generates readable action texts.
prepare_state_prompt(): Builds a text listing state variables for decision-making.

Python

    def prepare_actions(self):
        budget = self.action_config.budget
        self.actions = [[(c, budget)] for c in self.choices]
        self.action_strs = [f"Action {i+1}. {c}: {budget} acres" for i, (c, _) in enumerate(self.actions)]
        return self.action_strs, "\n".join(self.action_strs)

    def prepare_state_prompt(self):
        sb = self.state_config.states or FRUIT_STATES[self.year]["agnostic"]
        lines = ["State variables to consider:"]
        for k, desc in sb.items():
            lines.append(f"- {k}: {desc}")
        return "\n".join(lines)

3. Generate Belief Distributions

Creates belief distributions for each state using the mock LLM.
Parses the model’s JSON output into state-wise belief mappings and converts phrases to numeric scores.
Normalizes scores into probabilities, stores them in self.belief_dist and returns the prompt with raw response.

Python

    def generate_belief_distribution(self):
        prompt = self.prepare_context() + "\n" + self.prepare_state_prompt()
        raw = self.llm.generate(prompt)
        parsed = json.loads(raw)
        self.belief_dist = {}
        for state, mapping in parsed.items():
            vals, phrases = list(mapping.keys()), list(mapping.values())
            scores = [self.belief2score[p] for p in phrases]
            probs = [s / sum(scores) for s in scores]
            self.belief_dist[state] = (vals, probs)
        return prompt, raw

4. Ranking State-Action Pairs

sample_state_action_pairs(): Randomly combines sampled belief states with possible actions to form state-action pairs.
run_preference_elicitation(): Sends these pairs to the mock LLM, gets a ranked decision and parses the JSON output.

Python

    def sample_state_action_pairs(self):
        pairs = []
        for i in range(self.preference_config.sample_size):
            s = [f"{k}: {random.choice(v[0])}" for k, v in self.belief_dist.items()]
            a = random.choice(self.action_strs)
            pairs.append(f"- State-Action Pair {i+1}. State: {', '.join(s)}; {a}")
        return pairs

    def run_preference_elicitation(self):
        pairs = self.sample_state_action_pairs()
        prompt = "\n".join(pairs)
        raw = self.llm.generate(prompt)
        return pairs, prompt, json.loads(raw)

5. Compute Expected Utilities and Best Action

compute_utility(): Calculates the expected profit
compute_best_action(): Computes average utilities for all actions and selects the one with the highest expected value across samples.

Python

    def compute_utility(self, action, state_sample):
        fruit = action.split(". ")[1].split(":")[0].strip()
        yield_per_acre = self.stats[fruit]["yield"]
        price = self.stats[fruit]["price"]
        modifier = 1.0
        for s in state_sample:
            if "drought" in s: modifier *= 0.7
            if "pest" in s: modifier *= 0.6
        return yield_per_acre * price * self.action_config.budget * modifier

    def compute_best_action(self, samples=200):
        utilities = {a: np.mean([self.compute_utility(a, []) for _ in range(samples)]) for a in self.action_strs}
        best = max(utilities.items(), key=lambda x: x[1])
        return best[0], utilities

Step 5: Main Script

Initializes the agent: Sets up crop data, budget and configuration details.
Generates and evaluates: Creates belief distributions, actions and rankings using the mock LLM, then computes revenue and prints the best planting action.

Python

def main():
    choices = FRUITS["2021"]
    budget = 10
    print(f"Available Crops: {', '.join(choices)}")
    print(f"Budget: {budget} acres\n")
    agent = DeLLMaAgent(
        choices=choices,
        year="2021",
        action_config=ActionConfig(budget=budget),
        state_config=StateConfig(states=FRUIT_STATES["2021"]["agnostic"]),
        preference_config=PreferenceConfig(),
        llm=MockLLM,
    )
    print("Generating belief distribution about external factors...")
    _, belief_raw = agent.generate_belief_distribution()
    print("Belief states prepared.\n")
    actions, _ = agent.prepare_actions()
    print("Possible actions:")
    for a in actions:
        print(f" - {a}")
    print()
    print("Evaluating state-action pairs using preference elicitation...")
    _, _, pref_response = agent.run_preference_elicitation()
    print(f"Top preferred pair (mocked LLM output): {pref_response['decision']}\n")
    print("Computing expected utility for each crop (sampling-based)...")
    best_action, utilities = agent.compute_best_action(samples_per_action=100)
    for a, u in utilities.items():
        print(f" - {a}: Expected Revenue = ${u:,.2f}")
    print("\nRecommended Action:")
    print(f"{best_action}")
    print(f"Estimated Profit: ${utilities[best_action]:,.2f}")

if __name__ == "__main__":
    main()

Output:

You can download the complete code file from here.

LLM Reasoning Techniques

Chain-of-Thought (CoT): Prompts step-by-step thinking great for logic, but weak with uncertainty or probability.
Self-Consistency: Generates multiple reasoning paths and selects the most common outcome effective for puzzles but weak with real-world complexity.
Direct Prompting: Gives the LLM a decision problem without structure, often causing inconsistent or inaccurate results.
DeLLMa: It adds a decision-theoretic layer to LLMs, helping them reason more logically and transparently using utility, probability and context-based evaluation

Applications

Helps organizations choose optimal strategies by balancing profit, risk and long-term goals.
Assists farmers in selecting the most profitable crops under uncertain weather and market conditions.
Supports doctors in choosing safe and effective treatment plans based on patient data.
Enables robots and self-driving vehicles to make safe real-time decisions under uncertainty.
Enhances AI chatbots’ decision-making when handling ambiguous or incomplete user queries.
Optimizes logistics, inventory and routing under fluctuating supply and demand.

Limitations

LLMs are not perfect at estimating real-world probabilities.
Utility elicitation (understanding user preferences) can be subjective.
The model’s reasoning quality depends heavily on prompt clarity and context richness.

Difference between DeLLMa and Human-in-the-Loop (HITL) Decision Making

Aspect	DeLLMa	HITL Decision Making
Definition	An LLM framework that autonomously makes goal-driven decisions under uncertainty	A human-in-the-loop approach where human guide and refine AI decisions.
Core Approach	Fully automated reasoning using probabilistic models and expected utility.	Human offers feedback and intervention throughout decision-making.
Decision Maker	LLM-based AI chooses the optimal action.	Final decisions remain human-centered
Human Involvement	Minimal mainly during prompt or goal setup.	Humans continuously provide feedback, evaluation and ethical oversight.
Uncertainty Handling	Mathematically managed via probability and utility estimation.	Guided by human intuition, experience and contextual judgment.

DeLLMa Decision Making under Uncertainty with LLMs

Core Idea

LLMs in Decision-Making

How it Works

Step-By-Step Implementation

Step1: Import Required Libraries

Step2 : Create a Mock LLM

Step 3: Define Configuration Data Classes

Step 4: Implement the DeLLMaAgent Class

Step 5: Main Script

LLM Reasoning Techniques

Applications

Limitations

Difference between DeLLMa and Human-in-the-Loop (HITL) Decision Making

Explore