Ficar is a digital solution that aims to become the first and most reliable smart and comprehensive assistant for car users in Vietnam, offering an open, optimized, and user-friendly ecosystem.
A tool for analyzing text data from unstructured textual datasets
Confidential
9 months
AI
2024
AWS, OpenAI API, Python, spaCy, HuggingFace Transformers, MongoDB Atlas, FastAPI, ReactJS
TextInsights Analyzer is a data intelligence and analytics solution that aims to transform massive volumes of unstructured textual data into actionable business intelligence through advanced natural language processing. This platform serves as a critical bridge between raw information and strategic decision-making, allowing organizations to uncover hidden patterns and trends within their text-based assets.
TextInsights Analyzer aims to empower businesses to derive value from their growing repositories of unstructured content. However, several challenges stood in the way of achieving this goal:
– Computational inefficiency in real-time analysis: Processing high-volume text streams through heavy transformer models often leads to significant latency, making real-time insight extraction difficult.
– Semantic inconsistency across domains: General-purpose models frequently struggle with industry-specific jargon or nuanced context, leading to inaccurate entity recognition and sentiment scores.
– High operational costs for API reliance: Over-reliance on external Large Language Model (LLM) APIs for every basic text task can result in unsustainable variable costs as data volume scales.
To overcome the challenges, Hola Tech adhered to the best practices. Key components of the solution included:
– Implement a hybrid processing pipeline: Utilize spaCy for lightning-fast initial text preprocessing, such as tokenization and part-of-speech tagging, while delegating complex semantic tasks to HuggingFace Transformers. This multi-staged approach, built with Python, ensures high-speed throughput for basic structure while maintaining state-of-the-art accuracy for deep contextual analysis.
– Engineer an intelligent inference layer: Develop a high-performance FastAPI backend that acts as a traffic controller. The system uses local Python logic to handle 80% of routine categorization and only routes highly complex or ambiguous “long-tail” queries to the OpenAI API, significantly reducing operational costs while ensuring maximum insight depth where it matters most.
– Optimize high-performance storage: Deploy MongoDB Atlas as the central repository for processed insights. By leveraging Mongo’s flexible document schema, the system can store varied metadata, such as sentiment scores, extracted entities, and vector embeddings, in a single, queryable record, enabling the ReactJS frontend to display complex data visualizations without expensive join operations.
– Reduced text processing costs by 60% by implementing a hybrid model that prioritizes local HuggingFace inference over external APIs
– Increased entity recognition accuracy by 35% across domain-specific datasets through custom spaCy pipeline extensions
– Processed over 5 million documents per month with a sub-second average latency per analysis request
The implemented system successfully met TextInsights Analyzer’s requirements for performance, security, and functionality. The platform has significantly lowered the barrier to entry for text-based discovery, attracting a diverse range of users from market researchers to legal analysts. TextInsights Analyzer has empowered these users to navigate their data landscapes quickly and efficiently, with millions of distinct insights already extracted and visualized. This demonstrates the platform’s success in providing a scalable, accessible, and highly accurate solution for the modern data-driven economy.
Ficar is a digital solution that aims to become the first and most reliable smart and comprehensive assistant for car users in Vietnam, offering an open, optimized, and user-friendly ecosystem.
A Warehouse Management System (WMS) and an Order Management System (OMS) are interconnected solutions designed to optimize and streamline critical aspects of a supply chain.
Verify Trusted is a service that aims to consolidate and showcase a business’s online reputation by combining reviews from various platforms into a single, optimized profile, thereby enhancing trust and online visibility
PRS is an advanced software solution that aims to streamline communication and automate interactions through a sophisticated prompt and response system.