Top LinkedIn Content on Organizing Digital Files Efficiently

216,991 followers 1y

💎 Accessibility For Designers Checklist (PDF: https://lnkd.in/e9Z2G2kF), a practical set of cards on WCAG accessibility guidelines, from accessible color, typography, animations, media, layout and development — to kick-off accessibility conversations early on. Kindly put together by Geri Reid. WCAG for Designers Checklist, by Geri Reid Article: https://lnkd.in/ef8-Yy9E PDF: https://lnkd.in/e9Z2G2kF WCAG 2.2 Guidelines: https://lnkd.in/eYmzrNh7 Accessibility isn’t about compliance. It’s not about ticking off checkboxes. And it’s not about plugging in accessibility overlays or AI engines either. It’s about *designing* with a wide range of people in mind — from the very start, independent of their skills and preferences. In my experience, the most impactful way to embed accessibility in your work is to bring a handful of people with different needs early into design process and usability testing. It’s making these test sessions accessible to the entire team, and showing real impact of design and code on real people using a real product. Teams usually don’t get time to work on features which don’t have a clear business case. But no manager really wants to be seen publicly ignoring their prospect customers. Visualize accessibility to everyone on the team and try to make an argument about potential reach and potential income. Don’t ask for big commitments: embed accessibility in your work by default. Account for accessibility needs in your estimates. Create accessibility tickets and flag accessibility issues. Don’t mistake smiling and nodding for support — establish timelines, roles, specifics, objectives. And most importantly: measure the impact of your work by repeatedly conducting accessibility testing with real people. Build a strong before/after case to show the change that the team has enabled and contributed to, and celebrate small and big accessibility wins. It might not sound like much, but it can start changing the culture faster than you think. Useful resources: Giving A Damn About Accessibility, by Sheri Byrne-Haber (disabled) https://lnkd.in/eCeFutuJ Accessibility For Designers: Where Do I Start?, by Stéphanie Walter https://lnkd.in/ecG5qASY Web Accessibility In Plain Language (Free Book), by Charlie Triplett https://lnkd.in/e2AMAwyt Building Accessibility Research Practices, by Maya Alvarado https://lnkd.in/eq_3zSPJ How To Build A Strong Case For Accessibility, ↳ https://lnkd.in/ehGivAdY, by 🦞 Todd Libby ↳ https://lnkd.in/eC4jehMX, by Yichan Wang #ux #accessibility

40 Comments

Brij kishore Pandey

AI Architect | Strategist | Generative AI | Agentic AI

691,592 followers 1y

Version control with GIT has become an essential skill for developers. But understanding GIT fundamentals can seem daunting at first. In this post, I'll provide a quick overview of some core GIT concepts and commands. Key concepts: - Repository - Where your project files and commit history are stored - Commit - A snapshot of changes, like a version checkpoint - Branch - A timeline of commits that lets you work on parallel versions - Merge - To combine changes from separate branches - Pull request - Propose & review changes before merging branches Key commands: - git init - Initialize a new repo - git status - View changed files not staged for commit - git add - Stage files for commit - git commit - Commit staged snapshot - git branch - List, create, or delete branches - git checkout - Switch between branches - git merge - Join two development histories (branches) - git push/pull - Send/receive commits to remote repo With these basic concepts and commands, you can leverage GIT to track changes, work in branches, and collaborate with others. Of course there is much more to discover about GIT workflows, advanced commands, and integrations. But understanding these fundamentals will give you the base to get started productively using GIT version control. Have I overlooked anything? Please share your thoughts—your insights are priceless to me.

56 Comments

Luis Wiedmann

CS Student @ TUM

4,754 followers 5mo

We’re open-sourcing our pipeline for deduplicating large-scale image datasets. It can deduplicate 10k images against 1M indexed test images in ~60 seconds on a single GPU. The problem: When building training datasets, it's critical to ensure that no test images from evaluation benchmarks leak in. Manually checking millions of images isn't feasible, so we built an automated solution thats fast enough to be used as a final step before publishing datasets. Step 1 – Test Set Indexing: We indexed all test image datasets from lmms-lab (used by the lmms-eval benchmark) using SSCD, a model that creates descriptor embeddings specifically for copy detection. This results in approximately 700,000 embeddings across 66 datasets, which gives us a comprehensive reference set to check for duplicates. Step 2 - Fast Deduplication: For each new dataset we add to our training collection, we run the deduplication pipeline that: > Embeds the new dataset using SSCD > Computes the cosine similarity between each image and our test set embeddings > Returns duplicate indices + similarity scores All in ~60 seconds for 10k images! Performance: The current implementation is fast out of the box. For even better performance at scale, libraries like FAISS can accelerate the similarity search, which becomes the main bottleneck as datasets grow. Bonus - Clustering & Labeling: We also included a clustering pipeline using UMAP + DBSCAN, followed by semantic labeling via a VLM. This reveals the conceptual structure of each dataset. The visualization in this post was generated this way! The pipeline is plug-and-play with any Hugging Face dataset. Check out the repo in the comments for the full implementation.

11 Comments

Milan Jovanović

Practical .NET and Software Architecture Tips | Microsoft MVP

262,306 followers 1y

Did you hear about screaming architecture? Your architecture should communicate what problems it solves. Moreover, it should focus on the use cases. When you look at the folder structure and source files, do they scream: - Health Care System or Apartment Booking System? - Or do they scream ASP .NET Core? The symptoms of the last approach are folder names like this: ❌ Technical 📂 Controllers 📂 Entities 📂 Exceptions 📂 Services The problem is they're based on technical concerns. Here's a better example: ✅ Use case driven 📂 Apartments 📂 Bookings 📂 Payments 📂 Disputes This is an example of screaming architecture. The folder structure expresses the intent of your system. The benefits of grouping around use cases are: - Easier navigation - Improved cohesion - Simplified maintenance - High coupling for a single feature - Low coupling between unrelated features Here's a practical implementation of screaming architecture: https://lnkd.in/eh_Vej3n --- Subscribe to my weekly newsletter to accelerate your .NET skills: https://bit.ly/3Wc95rM

181 Comments

Pooja Jain

181,840 followers 10mo

How can Data Engineers leverage the open-source AI stack to build innovative solutions? Storage and Vector Operations: ->PostgreSQL with pgvector enables storing and querying embeddings directly in your database, perfect for semantic search applications. ->Combine this with FAISS for high-performance similarity search when dealing with millions of vectors. ->For example, you can build a document retrieval system that finds relevant technical documentation based on semantic similarity. Data Pipeline Orchestration: ->Netflix's Metaflow shines for ML workflows, allowing you to build reproducible, versioned data pipelines. ->You can create pipelines that preprocess data, generate embeddings, and update your vector store automatically. ->Useful for maintaining up-to-date knowledge bases that feed into RAG applications. Embedding Generation at Scale: ->Tools like Nomic and JinaAI help generate embeddings efficiently. ->You can build batch processing systems that convert large document repositories into vector representations, essential for building enterprise search systems or content recommendation engines. Model Deployment Infrastructure: ->FastAPI combined with Langchain provides a robust framework for deploying AI endpoints. ->You can build APIs that handle both traditional data operations and AI inference, making it easier to integrate AI capabilities into existing data platforms. Retrieval and Augmentation: ->Weaviate and Milvus excel at vector storage and retrieval at scale. ->Can be used to build systems that combine structured data from your data warehouse with unstructured data through vector similarity, enabling hybrid search solutions that leverage both traditional SQL and vector similarity. Here are some Real-world applications that can be explored: ➡️ Document intelligence systems that automatically categorize and route internal documents Ref: - Building Document Understanding Systems with LangChain: https://lnkd.in/gFgfSbwr - Learn Vector Embeddings with Weaviate's Documentation: https://lnkd.in/g96ym4BJ - pgvector Tutorial for Document Search: https://lnkd.in/gue4gzcs ➡️ Customer support systems that leverage historical ticket data for automated response generation Ref: - RAG (Retrieval Augmented Generation) with LlamaIndex: https://lnkd.in/gAM6_2fv ➡️ Product recommendation engines that combine traditional collaborative filtering with semantic similarity Ref: - FAISS for Similarity Search: https://lnkd.in/gTuCgyBE - AWS Personalize: https://lnkd.in/ggNar5xU ➡️ Data quality monitoring systems that use embeddings to detect anomalies in data patterns Ref: - Great Expectations: https://lnkd.in/g7JjGjBu - Azure ML Data Drift: https://lnkd.in/geYTXBXd Inspired by: ByteByteGo #dataengineering #artificialintelligence #innovation #ML #cloud

39 Comments

Deepak Bhardwaj

Agentic AI Champion | 40K+ Readers | Simplifying GenAI, Agentic AI and MLOps Through Clear, Actionable Insights

45,100 followers 1y

Can You Trust Your Data the Way You Trust Your Best Team Member? Do you know the feeling when you walk into a meeting and rely on that colleague who always has the correct information? You trust them to steer the conversation, to answer tough questions, and to keep everyone on track. What if data could be the same way—reliable, trustworthy, always there when you need it? In business, we often talk about data being "the new oil," but let’s be honest: without proper management, it’s more like a messy garage full of random bits and pieces. It’s easy to forget how essential data trust is until something goes wrong—decisions are based on faulty numbers, reports are incomplete, and suddenly, you’re stuck cleaning up a mess. So, how do we ensure data is as trustworthy as that colleague you rely on? It starts with building a solid foundation through these nine pillars: ➤ Master Data Management (MDM): Consider MDM the colleague who always keeps the big picture in check, ensuring everything aligns and everyone is on the same page. ➤ Reference Data Management (RDM): Have you ever been in a meeting where everyone uses a different term for the same thing? RDM removes the confusion by standardising key data categories across your business. ➤ Metadata Management: Metadata is like the notes and context we make on a project. It tracks how, when, and why decisions were made, so you can always refer to them later. ➤ Data Catalog: Imagine a digital filing cabinet that’s not only organised but searchable, easy to navigate, and quick to find exactly what you need. ➤ Data Lineage: This is your project’s timeline, tracking each step of the data’s journey so you always know where it has been and is going. ➤ Data Versioning: Data evolves as we update project plans. Versioning keeps track of every change so you can revisit previous versions or understand shifts when needed. ➤ Data Provenance: Provenance is the backstory—understanding where your data originated helps you assess its trustworthiness and quality. ➤ Data Lifecycle Management: Data doesn’t last forever, just like projects have deadlines. Lifecycle management ensures your data is used and protected appropriately throughout its life. ➤ Data Profiling: Consider profiling a health check for your data, spotting potential errors or inconsistencies before they affect business decisions. When we get these pillars right, data goes from being just a tool to being a trusted ally—one you can count on to help make decisions, drive strategies, and ultimately support growth. So, what pillar would you focus on to make your data more trustworthy? Cheers! Deepak Bhardwaj

27 Comments

Muazma Zahid

Data and AI Leader | Advisor | Speaker

17,632 followers 1y

Hello, LinkedIn community! In the fourth Friday post of our weekly series, let's dive into 𝐇𝐲𝐛𝐫𝐢𝐝 𝐒𝐞𝐚𝐫𝐜𝐡. It is an approach that combines multiple search techniques to improve the efficiency and effectiveness of search algorithms, particularly in complex and high-dimensional data. It integrates various methods, overcoming the limitations of individual techniques and adapting to diverse data distributions and problem domains. 𝐂𝐨𝐧𝐜𝐞𝐩𝐭: Hybrid search combines full-text and vector queries executed against a search index containing both plain text content and generated embeddings. 𝐇𝐨𝐰 𝐈𝐭 𝐖𝐨𝐫𝐤𝐬: - Vector fields with embeddings coexist alongside textual and numerical fields in the search index. Most relational databases are already great with full text and numerical filtering and search. - Hybrid queries take advantage of existing functionality (filtering, faceting, sorting, etc.) in a single search request. - Results from full-text and vector search queries are merged using Reciprocal Rank Fusion (RRF) to provide a unified result set. 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐄𝐱𝐚𝐦𝐩𝐥𝐞: 𝐒𝐜𝐞𝐧𝐚𝐫𝐢𝐨: Imagine building a large digital library search system. 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡: - Apply filters like book genre, author etc. to narrow your search. - Use BM25 (lexical search) to quickly fetch with search keywords. - Combine it with semantic search using vector embeddings to find contextually related books. - Apply RRF to find best possible result. 𝐖𝐡𝐲 𝐮𝐬𝐞 𝐇𝐲𝐛𝐫𝐢𝐝 𝐒𝐞𝐚𝐫𝐜𝐡? It improves search quality by leveraging both lexical and vector search features. Combining techniques enhances search quality and accuracy. #HybridSearch #VectorSearch #AI #SemanticSearch #RRF #BM25 #learnwithmz P.S. the image is generated via DALL·E 3 using Azure AI Studio

4 Comments

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

40,913 followers 10mo

LlamaIndex just unveiled a new approach involving AI agents for reliable document processing, from processing invoices to insurance claims and contract reviews. LlamaIndex’s new architecture, Agentic Document Workflows (ADW), goes beyond basic retrieval and extraction to orchestrate end-to-end document processing and decision-making. Imagine a contract review workflow: you don't just parse terms, you identify potential risks, cross-reference regulations, and recommend compliance actions. This level of coordination requires an agentic framework that maintains context, applies business rules, and interacts with multiple system components. Here’s how ADW works at a high level: (1) Document parsing and structuring – using robust tools like LlamaParse to extract relevant fields from contracts, invoices, or medical records. (2) Stateful agents – coordinating each step of the process, maintaining context across multiple documents, and applying logic to generate actionable outputs. (3) Retrieval and reference – tapping into knowledge bases via LlamaCloud to cross-check policies, regulations, or best practices in real-time. (4) Actionable recommendations – delivering insights that help professionals make informed decisions rather than just handing over raw text. ADW provides a path to building truly “intelligent” document systems that augment rather than replace human expertise. From legal contract reviews to patient case summaries, invoice processing, and insurance claims management—ADW supports human decision-making with context-rich workflows rather than one-off extractions. Ready to use notebooks https://lnkd.in/gQbHTTWC More open-source tools for AI agent developers in my recent blog post https://lnkd.in/gCySSuS3

3 Comments

Deepak Sharma

45,817 followers 3mo

The simplest but most impactful optimisation you can do in your frontend app is enable compression. Story first Our bundle was 3.5 MB. Users kept saying the site felt slow. Turned on gzip on the server → transfer dropped to 1.4 MB. No code changes. Just a config. Users instantly felt the site lighter. Why this works JS, CSS, HTML, JSON, SVG are text-heavy. Text compresses well. • Without compression → full 3.5 MB travels. • With gzip/Brotli → repeated patterns shrink → browser auto-decompresses. • Same content, 60% fewer bytes → faster FCP, LCP, TTI. What to compress ✅ HTML, CSS, JS, JSON, SVG, XML ❌ Images, videos, PDFs, fonts (already compressed) How to check Chrome DevTools → Network → click a JS/CSS file → look for Content-Encoding. If blank, you’re shipping raw bytes. Extra tip Brotli compresses 15–20% smaller than gzip. Serve .br to modern browsers, gzip as fallback. ⚡ Go ahead and check if your app already has this enabled. If not, enable it today and feel the difference yourself.

18 Comments

Shivam Shrivastava

SWE-ML@ Google | Microsoft | IIT KGP • Kaggle & Codeforces Expert

206,929 followers 4mo

Ever wondered how a search engine like 𝗚𝗼𝗼𝗴𝗹𝗲 or 𝗕𝗶𝗻𝗴 finds results in milliseconds? It’s one of the most misunderstood system design problems - and it’s more relevant than ever for interviews and real-world roles. Let’s break it down simply. 𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 You're given a million documents. Each is ~10KB. Now: someone types a few keywords - and your system needs to return all matching documents instantly. How do you design this? 𝗧𝗵𝗲 𝗖𝗼𝗿𝗲 𝗜𝗱𝗲𝗮: 𝗜𝗻𝘃𝗲𝗿𝘁𝗲𝗱 𝗜𝗻𝗱𝗲𝘅 Instead of scanning every document, we pre-build a structure that works like the index at the back of a book. For each word, we store a sorted list of locations - i.e., which documents contain the word, and where. So, when a user searches for multiple words, we just find the intersection of these lists. And since they’re sorted, we can intersect efficiently. But that’s just the start. Real speed needs real optimization. Let’s dive deeper: 1. Delta Compression Store the difference between document IDs instead of the full IDs. Why? Smaller data → better cache usage → faster lookup. 2. Caching Frequent Queries User queries follow a skewed pattern - a few are extremely common. Cache them. You’ll save compute for the majority of traffic. 3. Frequency-Based Indexing Not all documents are equal. Keep high-quality/top-ranked documents in memory, and the rest on disk. Most queries will hit RAM-only, keeping latency low. 4. Smart Intersection Order Always intersect the smallest sets first. If you search "INDIA GDP 2009", it’s faster to start with "GDP" and "2009" than with "INDIA". 5. Multilevel Indexing Want better accuracy? Break documents into paragraphs or sentences and index them too. That way, matches are not just found - they’re found in context. Why this matters: This isn't just about search engines. It’s about designing systems that handle scale, latency, and optimization - the exact thinking top tech companies test for. Mastering this gives you an edge in interviews and real-world backend design.

41 Comments

Organizing Digital Files Efficiently

More in Organizing Digital Files Efficiently

More Productivity topics

Explore categories