$34.00 Fixed
AI/NLP Consultant (Multimodal Data Pipeline Supervision)
Position Overview:
We are developing an advanced AI chatbot that leverages authoritative texts and research papers to provide accurate, source-backed responses. We are seeking an experienced consultant to supervise and engineer a pipeline that processes PDFs (text + images), reduces redundancy, and structures data for fine-tuning large language models. This role requires expertise in NLP, computer vision, and workflow automation, with a focus on ensuring data accuracy and compliance.
Key Responsibilities:
PDF & Image Processing:
Extract text and images from PDFs using tools like PyMuPDF, pdfplumber, and OCR solutions (e.g., Tesseract, Google Vision).
Convert diagrams and tables into text via captioning (using models such as GPT-4V, LLaVA) and structured markdown.
Preserve contextual relationships between images and adjacent text through layout analysis (e.g., using LayoutParser).
Data Structuring & Deduplication:
Clean and segment text data (using spaCy, regex) and integrate it with image-derived content.
Implement deduplication strategies for text (e.g., embedding clustering) and images (e.g., perceptual hashing, CLIP).
Maintain comprehensive source metadata (e.g., document, page, section) for full traceability.
LLM Integration:
Generate synthetic question-answer pairs using state-of-the-art models (e.g., GPT-4, Llama-3) from processed data.
Fine-tune LLMs (via platforms like Hugging Face or OpenAI) and/or build retrieval-augmented generation (RAG) systems (using tools like FAISS, Pinecone).
Ensure that responses include clear citations (e.g., “[Source: Document X, p. 45 + Fig 3.2]”).
Compliance & Validation:
Collaborate with domain experts to validate OCR outputs and image captions for accuracy.
Ensure compliance with applicable data handling and privacy standards.
Required Qualifications:
Technical Skills:
Proficiency in Python and familiarity with libraries for NLP (spaCy, transformers), computer vision (OpenCV, PyTorch), and PDF processing (PyMuPDF, Camelot).
Experience with OCR technologies (Tesseract, AWS Textract) and vision-language models (GPT-4V, CLIP).
Knowledge of LLM fine-tuning (using Hugging Face) and experience with vector databases (FAISS, Pinecone).
Data Engineering:
Ability to design scalable pipelines for multimodal data (text + images).
Expertise in deduplication strategies and robust metadata management.
Education:
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
Nice-to-Have:
Experience with layout analysis tools (LayoutParser, PubLayNet).
Familiarity with domain-specific ontologies and standards.
Prior experience in consulting on AI projects and contributing to open-source NLP/CV projects.
What We Offer:
Impact: Play a key role in building a tool that enhances access to evidence-based knowledge.
Cutting-Edge Tech: Engage with state-of-the-art models (e.g., GPT-4V, Llama-3) and multimodal RAG systems.
Collaboration: Work alongside experts to ensure data accuracy and system reliability.
Flexibility: Enjoy a remote-friendly role with competitive compensation.
How to Apply:
Please submit your resume, GitHub profile, and a brief cover letter that includes:
Your experience with PDF/image processing and LLMs.
An example of a project where you handled multimodal data (text + images).
Your approach to ensuring accuracy and compliance in data-driven systems.
The project should be completed in less than 3 months. The final deliverable should be a fully functional data pipeline that processes PDFs and images, ready for LLM fine-tuning. The consultant should expect frequent collaboration with our internal team throughout the project for feedback and guidance. Provide documentation that includes setup instructions and basic usage examples.
- Proposal: 0
- 93 days
Tapan Ahluwalia
,
Member since
Jun 19, 2024
Total Job