Top LinkedIn Content on User Task Analysis Techniques

CTIO, PwC

75,439 followers 1y

New! We’ve published a new set of automated evaluations and benchmarks for RAG - a critical component of Gen AI used by most successful customers today. Sweet. Retrieval-Augmented Generation lets you take general-purpose foundation models - like those from Anthropic, Meta, and Mistral - and “ground” their responses in specific target areas or domains using information which the models haven’t seen before (maybe confidential, private info, new or real-time data, etc). This lets gen AI apps generate responses which are targeted to that domain with better accuracy, context, reasoning, and depth of knowledge than the model provides off the shelf. In this new paper, we describe a way to evaluate task-specific RAG approaches such that they can be benchmarked and compared against real-world uses, automatically. It’s an entirely novel approach, and one we think will help customers tune and improve their AI apps much more quickly, and efficiently. Driving up accuracy, while driving down the time it takes to build a reliable, coherent system. 🔎 The evaluation is tailored to a particular knowledge domain or subject area. For example, the paper describes tasks related to DevOps troubleshooting, scientific research (ArXiv abstracts), technical Q&A (StackExchange), and financial reporting (SEC filings). 📝 Each task is defined by a specific corpus of documents relevant to that domain. The evaluation questions are generated from and grounded in this corpus. 📊 The evaluation assesses the RAG system's ability to perform specific functions within that domain, such as answering questions, solving problems, or providing relevant information based on the given corpus. 🌎 The tasks are designed to mirror real-world scenarios and questions that might be encountered when using a RAG system in practical applications within that domain. 🔬 Unlike general language model benchmarks, these task-specific evaluations focus on the RAG system's performance in retrieving and applying information from the given corpus to answer domain-specific questions. ✍️ The approach allows for creating evaluations for any task that can be defined by a corpus of relevant documents, making it adaptable to a wide range of specific use cases and industries. Really interesting work from the Amazon science team, and a new totem of evaluation for customers choosing and tuning their RAG systems. Very cool. Paper linked below.

32 Comments

Vitaly Friedman

216,991 followers 4mo

✅ How To Run Task Analysis In UX (https://lnkd.in/e_s_TG3a), a practical step-by-step guide on how to study user goals, map user’s workflows, understand top tasks and then use them to inform and shape design decisions. Neatly put together by Thomas Stokes. 🚫 Good UX isn’t just high completion rates for top tasks. 🤔 Better: high accuracy, low task on time, high completion rates. ✅ Task analysis breaks down user tasks to understand user goals. ✅ Tasks are goal-oriented user actions (start → end point → success). ✅ Usually presented as a tree (hierarchical task-analysis diagram, HTA). ✅ First, collect data: users, what they try to do and how they do it. ✅ Refine your task list with stakeholders, then get users to vote. ✅ Translate each top task into goals, starting point and end point. ✅ Break down: user’s goal → sub-goals; sub-goal → single steps. ✅ For non-linear/circular steps: mark alternate paths as branches. ✅ Scrutinize every single step for errors, efficiency, opportunities. ✅ Attach design improvements as sticky notes to each step. 🚫 Don’t lose track in small tasks: come back to the big picture. Personally, I've been relying on top task analysis for years now, kindly introduced by Gerry McGovern. Of all the techniques to capture the essence of user experience, it’s a reliable way to do so. Bring it together with task completion rates and task completion times, and you have a reliable metric to track your UX performance over time. Once you identify 10–12 representative tasks and get them approved by stakeholders, we can track how well a product is performing over time. Refine the task wording and recruit the right participants. Then give these tasks to 15–18 actual users and track success rates, time on task and accuracy of input. That gives you an objective measure of success for your design efforts. And you can repeat it every 4–8 months, depending on velocity of the team. It’s remarkably easy to establish and run, but also has high visibility and impact — especially if it tracks the heart of what the product is about. Useful resources: Task Analysis: Support Users in Achieving Their Goals (attached image), by Maria Rosala https://lnkd.in/ePmARap3 What Really Matters: Focusing on Top Tasks, by Gerry McGovern https://lnkd.in/eWBXpCQp How To Make Sense Of Any Mess (free book), by Abby Covert https://lnkd.in/enxMMhMe How We Did It: Task Analysis (Case Study), by Jacob Filipp https://lnkd.in/edKYU6xE How To Optimize UX and Improve Task Efficiency, by Ella Webber https://lnkd.in/eKdKNtsR How to Conduct a Top Task Analysis, by Jeff Sauro https://lnkd.in/eqWp_RNG [continues in the comments below ↓]

16 Comments

Sarah Roberts, M.Ed.

Instructional Designer and Developer | Learning and Development Specialist | Lifelong Learner

4,324 followers 6mo

One of the first Instructional Design projects I worked on still sticks with me. The client handed me eight separate slide decks and said: “Can you turn all of this into one training?” 😑 I remember opening the files: hundreds of slides, all important in their own way, but something didn’t sit right. So, I asked a question that’s become my go-to ever since: “What’s the actual moment in the job where someone gets stuck, messes up, or hesitates, and needs this?” That question changed everything. Instead of cramming it all into a mega-course, we: ✅ Cut 70% of the content ✅ Turned the rest into two scenario-based simulations ✅ Built a one-page job aid that’s still in use today And the best part? People applied it, and didn't complain about taking it. The motivation to complete the course increased. The behavior changed. The feedback improved. I learned early on that we’re not here to cover content. We’re here to solve real work problems. So now, every time I hear, “Can you just turn this into a course?” I slow it down and ask: “What do people need to do, and where do they get stuck?” If you’ve been in that spot, I’d love to hear how you handled it👇. It can be a tough discussion, and pushing back respectfully can be daunting sometimes. #InstructionalDesign #LearningAndDevelopment #LXD #CorporateTraining #RealWorldLearning #JobRelevance #EarlyCareerLessons

110 Comments

Andrew Ng

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI

2,314,361 followers 3mo

Parallel agents are emerging as an important new direction for scaling up AI. AI capabilities have scaled with more training data, training-time compute, and test-time compute. Having multiple agents run in parallel is growing as a technique to further scale and improve performance. We know from work at Baidu by my former team, and later OpenAI, that AI models’ performance scales predictably with the amount of data and training computation. Performance rises further with test-time compute such as in agentic workflows and in reasoning models that think, reflect, and iterate on an answer. But these methods take longer to produce output. Agents working in parallel offer another path to improve results, without making users wait. Reasoning models generate tokens sequentially and can take a long time to run. Similarly, most agentic workflows are initially implemented in a sequential way. But as LLM prices per token continue to fall — thus making these techniques practical — and product teams want to deliver results to users faster, more and more agentic workflows are being parallelized. Some examples: - Many research agents now fetch multiple web pages and examine their texts in parallel to try to synthesize deeply thoughtful research reports more quickly. - Some agentic coding frameworks allow users to orchestrate many agents working simultaneously on different parts of a code base. Our short course on Claude Code shows how to do this using git worktrees. - A rapidly growing design pattern for agentic workflows is to have a compute-heavy agent work for minutes or longer to accomplish a task, while another agent monitors the first and gives brief updates to the user to keep them informed. From here, it’s a short hop to parallel agents that work in the background while the UI agent keeps users informed and perhaps also routes asynchronous user feedback to the other agents. It is difficult for a human manager to take a complex task (like building a complex software application) and break it down into smaller tasks for human engineers to work on in parallel; scaling to huge numbers of engineers is especially challenging. Similarly, it is also challenging to decompose tasks for parallel agents to carry out. But the falling cost of LLM inference makes it worthwhile to use a lot more tokens, and using them in parallel allows this to be done without significantly increasing the user’s waiting time. I am also encouraged by the growing body of research on parallel agents. For example, I enjoyed reading “CodeMonkeys: Scaling Test-Time Compute for Software Engineering” by Ryan Ehrlich and others, which shows how parallel code generation helps you to explore the solution space. The mixture-of-agents architecture by Junlin Wang is a surprisingly simple way to organize parallel agents. [Truncated for length. Full text, with links: https://lnkd.in/gQ98HMci ]

AI-Powered Phones Get Proactive, Robot Antelope Joins Herd, LLM Environmental Impacts Get Measured deeplearning.ai

174 Comments

Kuldeep Singh Sidhu

Senior Data Scientist @ Walmart | BITS Pilani

13,160 followers 3mo

When Should You Actually Use GraphRAG? New Research Reveals the Truth A groundbreaking study from Xiamen University and The Hong Kong Polytechnic University just dropped some serious insights about Graph Retrieval-Augmented Generation (GraphRAG) vs traditional RAG systems. The Reality Check: While GraphRAG has been hyped as the next evolution in AI, this research found it frequently underperforms vanilla RAG on many real-world tasks. The question isn't whether GraphRAG is good-it's *when- it's actually worth the complexity. Under the Hood: How GraphRAG Actually Works Unlike traditional RAG that relies on semantic similarity for chunk retrieval, GraphRAG structures knowledge as interconnected graphs where nodes represent entities/concepts and edges define logical relationships. During query processing, it doesn't just retrieve similar content-it traverses the graph to uncover multi-hop reasoning chains, thematic evolution patterns, and indirect dependencies that traditional RAG misses. The Technical Deep Dive: The researchers built GraphRAG-Bench, a comprehensive evaluation framework that goes beyond surface-level metrics. It evaluates the entire pipeline through three critical stages: - Graph Construction Quality: Measuring node count, edge density, average degree, and clustering coefficients to assess how well the system organizes domain knowledge - Retrieval Performance: Tracking both context relevance and evidence recall to understand retrieval completeness vs precision trade-offs - Generation Accuracy: Evaluating factual consistency, faithfulness to retrieved context, and evidence coverage The Surprising Findings: Basic RAG dominates simple fact retrieval - GraphRAG's graph processing can actually introduce noise for straightforward queries GraphRAG shines in complex reasoning - Multi-hop queries, contextual summarization, and creative generation tasks show clear GraphRAG advantages Token overhead is real - Some GraphRAG implementations inflate prompts to 40,000+ tokens, creating efficiency challenges The Bottom Line: GraphRAG isn't a silver bullet. It excels when you need to synthesize interconnected concepts across hierarchical knowledge structures, but traditional RAG remains superior for simple, direct information retrieval. The researchers tested this across two distinct domains-structured medical guidelines and unstructured literary texts-proving these patterns hold across different knowledge densities. Key Takeaway for Practitioners: Choose your RAG architecture based on task complexity, not hype. Simple queries? Stick with traditional RAG. Complex domain reasoning requiring multi-hop synthesis? That's where GraphRAG earns its computational cost. This research provides the evidence-based guidance our industry desperately needed for making informed RAG implementation decisions.

Jatinder Verma

18,338 followers 1y

Interview Conversation Role: RTE Topic: Leveraging Jira Align 👨💼 Interviewer: "As an RTE, how do you use Jira Align to manage dependencies across teams in an Agile Release Train?" 🧑 Candidate: "Jira Align helps track tasks and dependencies between teams." 👨💼 Interviewer: "Imagine Team A is blocked because Team B’s feature isn’t ready, and this delay could impact the PI objectives. How would you use Jira Align to resolve and track such dependencies?" 🧑 Candidate: "I’d ask the teams to resolve it in their sync-up meetings." What a skilled RTE should have answered: ---------------------------------------------- 💡 Jira Align is a powerful tool for visualizing and proactively managing dependencies across teams and ARTs. Here’s how I’d approach the situation: ✍ 1. Proactive Identification: During PI Planning, I’d ensure teams clearly log dependencies in Jira Align’s Dependency Map. This allows us to identify blockers early and assess their impact on delivery timelines. ✍ 2. Continuous Tracking: I’d regularly review the Program Board in Jira Align to monitor the progress of dependencies. For example, if Team A relies on Team B’s feature, Jira Align enables both teams to align their schedules and track progress through automated updates. ✍ 3. Issue Resolution: In case of a delay, I’d leverage Jira Align to trigger an escalation. The tool’s centralized data makes it easy to identify priority dependencies, communicate risks to stakeholders, and propose adjustments to mitigate the impact on PI objectives. ✍ Example in Action: In a previous ART, a critical dependency delay between two teams risked derailing a feature release. By using Jira Align’s Portfolio Room, we aligned stakeholders, reprioritized deliverables, and reallocated capacity to keep the train on track. ✍ Impact: Jira Align ensures transparency, alignment, and faster conflict resolution, ultimately enabling ARTs to deliver value predictably. ✨ Key Takeaway: Managing dependencies is about more than meetings—it's about leveraging tools like Jira Align to proactively track, manage, and resolve risks. Transparency is the backbone of seamless execution. Join community for deeper insights: Link in the comment below #SAFe #ReleaseTrainEngineer #JiraAlign #AgileTransformation #DependencyManagement

7 Comments

Karen Kim

CEO @ Human Managed, the I.DE.A. platform.

5,613 followers 8mo

User Feedback Loops: the missing piece in AI success? AI is only as good as the data it learns from -- but what happens after deployment? Many businesses focus on building AI products but miss a critical step: ensuring their outputs continue to improve with real-world use. Without a structured feedback loop, AI risks stagnating, delivering outdated insights, or losing relevance quickly. Instead of treating AI as a one-and-done solution, companies need workflows that continuously refine and adapt based on actual usage. That means capturing how users interact with AI outputs, where it succeeds, and where it fails. At Human Managed, we’ve embedded real-time feedback loops into our products, allowing customers to rate and review AI-generated intelligence. Users can flag insights as: 🔘Irrelevant 🔘Inaccurate 🔘Not Useful 🔘Others Every input is fed back into our system to fine-tune recommendations, improve accuracy, and enhance relevance over time. This is more than a quality check -- it’s a competitive advantage. - for CEOs & Product Leaders: AI-powered services that evolve with user behavior create stickier, high-retention experiences. - for Data Leaders: Dynamic feedback loops ensure AI systems stay aligned with shifting business realities. - for Cybersecurity & Compliance Teams: User validation enhances AI-driven threat detection, reducing false positives and improving response accuracy. An AI model that never learns from its users is already outdated. The best AI isn’t just trained -- it continuously evolves.

1 Comment

Aishwarya Srinivasan

597,465 followers 2mo

If you’re building anything with LLMs, your system architecture matters more than your prompts. Most people stop at “call the model, get the output.” But LLM-native systems need workflows, blueprints that define how multiple LLM calls interact, how routing, evaluation, memory, tools, or chaining come into play. Here’s a breakdown of 6 core LLM workflows I see in production: 🧠 LLM Augmentation Classic RAG + tools setup. The model augments its own capabilities using: → Retrieval (e.g., from vector DBs) → Tool use (e.g., calculators, APIs) → Memory (short-term or long-term context) 🔗 Prompt Chaining Workflow Sequential reasoning across steps. Each output is validated (pass/fail) → passed to the next model. Great for multi-stage tasks like reasoning, summarizing, translating, and evaluating. 🛣 LLM Routing Workflow Input routed to different models (or prompts) based on the type of task. Example: classification → Q&A → summarization all handled by different call paths. 📊 LLM Parallelization Workflow (Aggregator) Run multiple models/tasks in parallel → aggregate the outputs. Useful for ensembling or sourcing multiple perspectives. 🎼 LLM Parallelization Workflow (Synthesizer) A more orchestrated version with a control layer. Think: multi-agent systems with a conductor + synthesizer to harmonize responses. 🧪 Evaluator–Optimizer Workflow The most underrated architecture. One LLM generates. Another evaluates (pass/fail + feedback). This loop continues until quality thresholds are met. If you’re an AI engineer, don’t just build for single-shot inference. Design workflows that scale, self-correct, and adapt. 📌 Save this visual for your next project architecture review. 〰️〰️〰️ Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg

69 Comments

Tern Poh Lim

Agentic AI Strategist & Innovator | NUS-Peking MBAs Valedictorian | NUS Master of Computing (AI) | Government Scholar

4,961 followers 1y

🚀 𝐖𝐡𝐲 𝐬𝐞𝐭𝐭𝐥𝐞 𝐟𝐨𝐫 𝐚 𝐬𝐢𝐧𝐠𝐥𝐞 𝐩𝐫𝐨𝐦𝐩𝐭 𝐰𝐡𝐞𝐧 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬𝐥𝐲 𝐢𝐦𝐩𝐫𝐨𝐯𝐞 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐫𝐞𝐟𝐢𝐧𝐞𝐦𝐞𝐧𝐭? 🚀 Traditional prompting involves providing a single prompt and receiving a direct response. This one-step interaction limits the ability to refine and improve the output of GenAI. Imagine trying to write a good essay without being able to revise any part once it's written. GenAI generates text from left to right, and once a word is generated, it cannot go back and change it. 𝐄𝐧𝐭𝐞𝐫 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰𝐬: 𝐚 𝐦𝐨𝐫𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐚𝐧𝐝 𝐦𝐮𝐥𝐭𝐢-𝐬𝐭𝐞𝐩 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐰𝐡𝐞𝐫𝐞 𝐀𝐈 𝐚𝐠𝐞𝐧𝐭𝐬 𝐩𝐞𝐫𝐟𝐨𝐫𝐦 𝐭𝐚𝐬𝐤𝐬, 𝐫𝐞𝐯𝐢𝐞𝐰 𝐭𝐡𝐞𝐢𝐫 𝐨𝐮𝐭𝐩𝐮𝐭𝐬, 𝐚𝐧𝐝 𝐫𝐞𝐟𝐢𝐧𝐞 𝐭𝐡𝐞𝐦 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞𝐥𝐲. 𝐓𝐡𝐢𝐬 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐢𝐬 𝐚𝐤𝐢𝐧 𝐭𝐨 𝐝𝐫𝐚𝐟𝐭𝐢𝐧𝐠 𝐚𝐧𝐝 𝐫𝐞𝐯𝐢𝐬𝐢𝐧𝐠 𝐚𝐧 𝐞𝐬𝐬𝐚𝐲, 𝐚𝐥𝐥𝐨𝐰𝐢𝐧𝐠 𝐟𝐨𝐫 𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬 𝐢𝐦𝐩𝐫𝐨𝐯𝐞𝐦𝐞𝐧𝐭 𝐚𝐧𝐝 𝐡𝐢𝐠𝐡𝐞𝐫-𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐫𝐞𝐬𝐮𝐥𝐭𝐬. Here’s a breakdown of three key AI agent frameworks: • 𝐑𝐞𝐚𝐬𝐨𝐧 + 𝐀𝐜𝐭 in 2022: Integrates reasoning with actions and observations, enabling AI to adapt dynamically based on external information. This iterative process enhances problem-solving abilities through continuous feedback loops. • 𝐒𝐞𝐥𝐟-𝐑𝐞𝐟𝐢𝐧𝐞 in 2023: Introduces an iterative feedback and refinement process where a GenAI generates an initial output, provides feedback, and refines iteratively. This versatile and efficient method improves performance across domains without additional training or supervised data. • 𝐀𝐥𝐩𝐡𝐚𝐂𝐨𝐝𝐢𝐮𝐦'𝐬 𝐅𝐥𝐨𝐰 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 in 2024: Focuses on improving code generation by repeatedly testing and refining the code to ensure it meets requirements. The principles of Flow Engineering could also apply to Question Answering tasks: • 𝐈𝐧𝐢𝐭𝐢𝐚𝐥 𝐀𝐧𝐬𝐰𝐞𝐫 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧: Generate an initial answer using an LLM. • 𝐒𝐞𝐥𝐟-𝐑𝐞𝐟𝐥𝐞𝐜𝐭𝐢𝐨𝐧: Reflect on the generated answer to identify areas for improvement. This could involve checking for accuracy, relevance, and completeness. • 𝐓𝐞𝐬𝐭𝐢𝐧𝐠 𝐚𝐧𝐝 𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤: Compare the generated answer against a set of reference answers or key points extracted from the context. • 𝐑𝐞𝐟𝐢𝐧𝐞𝐦𝐞𝐧𝐭: Based on the feedback, iteratively refine the answer. This could involve rephrasing sentences, adding missing information, or correcting inaccuracies. The progression highlights a trend toward more refined, efficient, and specialized AI workflows. Each stage builds on the previous one, incorporating iterative refinement and feedback mechanisms to enhance the quality and accuracy of outputs. 𝐖𝐡𝐚𝐭 𝐆𝐞𝐧𝐀𝐈 𝐚𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬 𝐚𝐫𝐞 𝐲𝐨𝐮 𝐜𝐮𝐫𝐫𝐞𝐧𝐭𝐥𝐲 𝐮𝐬𝐢𝐧𝐠? 𝐇𝐚𝐯𝐞 𝐲𝐨𝐮 𝐬𝐞𝐞𝐧 𝐭𝐡𝐞 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐚𝐠𝐞𝐧𝐭𝐢𝐜 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰𝐬 𝐢𝐧 𝐭𝐡𝐞𝐬𝐞 𝐭𝐨𝐨𝐥𝐬? #AgenticWorkflow #AIAgent #GenerativeAI

2 Comments

Alex Wang

Learn AI Together - I share my learning journey into AI & Data Science here, 90% buzzword-free. Follow me and let's grow together!

1,109,184 followers 2mo

Today’s surprise came from a Swedish AI startup - a “live-code” approach in the enterprise automation space. Most current agent frameworks (LangChain, AutoGPT, etc.) run step by step: → interpret prompt → call tool → wait → return → next step. Reliable, but slow. Similar to a human clicking through a task list. Hard scale well, especially in structured enterprise workflows. Incredible takes a different route, something they describe as a “𝐥𝐢𝐯𝐞-𝐜𝐨𝐝𝐞” execution model. It separates the system into two layers: 1- The LLM handles planning, turning your intent into structured logic 2- Then that logic is packaged and executed in parallel across your connected apps — not step by step, but all at once Think of it like turning a prompt into a full task graph or script, then running it instantly in a controlled, structured runtime. In other words: 𝐋𝐋𝐌 = 𝐩𝐥𝐚𝐧𝐧𝐞𝐫, 𝐫𝐮𝐧𝐭𝐢𝐦𝐞 = 𝐞𝐱𝐞𝐜𝐮𝐭𝐨𝐫 The advantages are clear when it comes to scale, speed, and reliability: ◾Capable of running large numbers of operations across tools in parallel — not one API call at a time ◾Clean separation between intent and execution ◾Designed for structured, context-heavy workflows (In their own examples: processing 2,000+ CRM records or handling multi-gigabyte datasets) 📍Explore/ Try it for free: 🔗https://lnkd.in/gaUDDDvz You’ll see similar execution patterns in dev tools like Replit or Copilot, and in some internal infra. But not much this kind of parallel, execution-first architecture applied to business automation at scale. Worth checking! #generativeai #automation #productivity #swedishtech

85 Comments

User Task Analysis Techniques

More in User Task Analysis Techniques

More User Experience topics

Explore categories