System Design: Skills Intelligence GraphRAG
Combining Knowledge Graphs with Skills Ontology for Intelligent Talent Systems
Interview context: This is a specialized system design combining GraphRAG (graph-based retrieval augmented generation) with HR/talent domain knowledge. It demonstrates how to build intelligent systems that understand relationships between skills, professions, and career paths.
Table of Contents
- Problem Understanding
- Skills Ontology Overview
- System Architecture
- Knowledge Graph Design
- GraphRAG Integration
- Use Cases & Query Patterns
- Data Pipeline
- Implementation Details
- Scalability & Performance
- Advanced Features
1. Problem Understanding
What Are We Building?
A Skills Intelligence GraphRAG System that combines:
- Skills Ontology (Textkernel-style): Structured knowledge about skills, professions, and their relationships
- GraphRAG: Graph-based retrieval for contextual, relationship-aware answers
- LLM Integration: Natural language interface for complex talent queries
Target Use Cases
┌─────────────────────────────────────────────────────────────────────┐
│ PRIMARY USE CASES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. TALENT MATCHING │
│ "Find candidates with Python skills who could transition │
│ to Machine Learning Engineer roles" │
│ │
│ 2. SKILLS GAP ANALYSIS │
│ "What skills does John need to become a Data Scientist?" │
│ │
│ 3. CAREER PATH RECOMMENDATIONS │
│ "What are potential career paths for a Frontend Developer?" │
│ │
│ 4. LEARNING RECOMMENDATIONS │
│ "What courses should I take to move into Cloud Architecture?" │
│ │
│ 5. JOB DESCRIPTION GENERATION │
│ "Generate required skills for a Senior DevOps Engineer" │
│ │
│ 6. WORKFORCE PLANNING │
│ "What skills will our team need in 2 years given AI trends?" │
│ │
└─────────────────────────────────────────────────────────────────────┘
Why GraphRAG for Skills?
Interviewer might ask: “Why not just use vector search?”
| Approach | Pros | Cons |
|---|---|---|
| Vector Search | Semantic similarity, handles synonyms | Misses explicit relationships |
| Keyword Search | Fast, exact matches | No understanding of context |
| GraphRAG | Relationship-aware, traverses connections | More complex to build |
GraphRAG advantage: When asked “Python developer skills”, it can traverse:
- Python → [related_to] → Data Analysis, ML, Web Dev
- Python Developer → [requires] → Python, Git, SQL
- Python → [prerequisite_for] → TensorFlow, Django
This relationship awareness enables career path analysis, skills gap detection, and intelligent recommendations that pure vector search cannot provide.
2. Skills Ontology Overview
Textkernel Skills Intelligence Structure
Based on the Textkernel Ontology API:
┌─────────────────────────────────────────────────────────────────────┐
│ SKILLS ONTOLOGY ENTITIES │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ SKILLS PROFESSIONS │
│ ├── Skill ID ├── Profession ID │
│ ├── Name (multi-language) ├── Name (multi-language) │
│ ├── Category ├── Industry/Domain │
│ ├── Type (Hard/Soft/Tool) ├── Seniority Levels │
│ └── Certifications └── Required Skills │
│ │
│ RELATIONSHIPS │
│ ├── Profession → Skills (has_skill) │
│ ├── Skills → Professions (enables_profession) │
│ ├── Skill → Skill (similar_to, prerequisite_of, related_to) │
│ └── Skill → Certification (validates) │
│ │
└─────────────────────────────────────────────────────────────────────┘
API Endpoints (Textkernel Reference)
| Endpoint | Purpose |
|---|---|
/professions/suggest_skills |
Get skills for a profession |
/professions/compare_skills |
Compare skills between professions |
/skills/suggest_professions |
Get professions for skills |
/skills/compare_to_profession |
Compare skill set to profession |
/skills/suggest_skills |
Get similar/related skills |
/skills/similarity_score |
Calculate skill set similarity |
3. System Architecture
High-Level Architecture
flowchart TB
Query["User Query (Natural Lang)"]
subgraph QueryProcessor["QUERY PROCESSOR"]
IC["Intent Classifier"]
EE["Entity Extractor"]
QP["Query Planner"]
end
Query --> QueryProcessor
QueryProcessor --> SG["Skills Graph (Neo4j)"]
QueryProcessor --> VS["Vector Store (Embeddings)"]
QueryProcessor --> DS["Document Store (Raw Data)"]
subgraph ContextAssembler["CONTEXT ASSEMBLER"]
GC["Graph Context"]
VC["Vector Context"]
HR["Hybrid Ranker"]
end
SG --> ContextAssembler
VS --> ContextAssembler
DS --> ContextAssembler
subgraph LLMGenerator["LLM RESPONSE GENERATOR"]
PB["Prompt Builder"]
LLM["LLM (GPT-4)"]
RF["Response Formatter"]
end
ContextAssembler --> LLMGenerator
LLMGenerator --> Response["Response<br/>(Structured + Explanation)"]
Component Overview
| Component | Purpose | Technology |
|---|---|---|
| Query Processor | Parse & understand user queries | LLM + NER |
| Skills Graph | Store skills ontology | Neo4j / Amazon Neptune |
| Vector Store | Semantic similarity search | Pinecone / Weaviate |
| Document Store | Raw job descriptions, resumes | Elasticsearch / S3 |
| Context Assembler | Combine graph + vector results | Custom logic |
| LLM Generator | Generate natural language responses | GPT-4 / Claude |
4. Knowledge Graph Design
Graph Schema
┌─────────────────────────────────────────────────────────────────────┐
│ SKILLS KNOWLEDGE GRAPH │
└─────────────────────────────────────────────────────────────────────┘
NODE TYPES:
═══════════
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ SKILL │ │ PROFESSION │ │ CATEGORY │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ id: string │ │ id: string │ │ id: string │
│ name: string │ │ name: string │ │ name: string │
│ type: enum │ │ industry: string│ │ parent_id: str │
│ description: str│ │ seniority: enum │ │ level: int │
│ popularity: int │ │ avg_salary: int │ └─────────────────┘
│ growth_rate: flt│ │ demand_trend: st│
└─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CERTIFICATION │ │ INDUSTRY │ │ LEARNING │
├─────────────────┤ ├─────────────────┤ │ RESOURCE │
│ id: string │ │ id: string │ ├─────────────────┤
│ name: string │ │ name: string │ │ id: string │
│ provider: string│ │ growth: float │ │ title: string │
│ validity: int │ │ size: int │ │ type: enum │
│ level: enum │ └─────────────────┘ │ duration: int │
└─────────────────┘ │ provider: string│
└─────────────────┘
RELATIONSHIP TYPES:
══════════════════
SKILL ──[SIMILAR_TO {score: 0.0-1.0}]──> SKILL
SKILL ──[PREREQUISITE_FOR]──> SKILL
SKILL ──[RELATED_TO {type: "complementary"|"alternative"}]──> SKILL
SKILL ──[BELONGS_TO]──> CATEGORY
SKILL ──[VALIDATED_BY]──> CERTIFICATION
SKILL ──[TAUGHT_BY]──> LEARNING_RESOURCE
PROFESSION ──[REQUIRES {importance: "must"|"nice"}]──> SKILL
PROFESSION ──[TRANSITIONS_TO {difficulty: 1-5}]──> PROFESSION
PROFESSION ──[BELONGS_TO]──> INDUSTRY
PROFESSION ──[SENIOR_VERSION_OF]──> PROFESSION
CATEGORY ──[PARENT_OF]──> CATEGORY
Example Graph Data
flowchart TB
Programming["CATEGORY: Programming"]
Programming -->|PARENT_OF| Backend["Backend Development"]
Programming -->|PARENT_OF| Frontend["Frontend Development"]
Programming -->|PARENT_OF| DataSci["Data Science"]
Backend --> Python["Python (Skill)"]
Frontend --> JavaScript["JavaScript (Skill)"]
DataSci --> Python2["Python (Skill)"]
Python --> Django["Django (Skill)"]
Python --> FastAPI["FastAPI (Skill)"]
JavaScript --> React["React (Skill)"]
Python2 --> Pandas["Pandas (Skill)"]
Django -->|VALIDATED_BY| DjangoCert["Django Certification"]
Neo4j Schema (Cypher)
// Create constraints
CREATE CONSTRAINT skill_id IF NOT EXISTS FOR (s:Skill) REQUIRE s.id IS UNIQUE;
CREATE CONSTRAINT profession_id IF NOT EXISTS FOR (p:Profession) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT category_id IF NOT EXISTS FOR (c:Category) REQUIRE c.id IS UNIQUE;
// Create indexes for search
CREATE INDEX skill_name IF NOT EXISTS FOR (s:Skill) ON (s.name);
CREATE INDEX profession_name IF NOT EXISTS FOR (p:Profession) ON (p.name);
CREATE FULLTEXT INDEX skill_search IF NOT EXISTS FOR (s:Skill) ON EACH [s.name, s.description];
// Example: Create skills
CREATE (python:Skill {
id: 'skill_python',
name: 'Python',
type: 'hard_skill',
description: 'General-purpose programming language',
popularity: 95,
growth_rate: 0.15
})
CREATE (ml:Skill {
id: 'skill_ml',
name: 'Machine Learning',
type: 'hard_skill',
description: 'Building systems that learn from data',
popularity: 88,
growth_rate: 0.25
})
// Example: Create relationships
MATCH (python:Skill {id: 'skill_python'})
MATCH (ml:Skill {id: 'skill_ml'})
CREATE (python)-[:PREREQUISITE_FOR]->(ml)
CREATE (python)-[:SIMILAR_TO {score: 0.7}]->(ml)
// Example: Profession with required skills
CREATE (ds:Profession {
id: 'prof_data_scientist',
name: 'Data Scientist',
industry: 'Technology',
seniority: 'mid',
avg_salary: 120000
})
MATCH (ds:Profession {id: 'prof_data_scientist'})
MATCH (python:Skill {id: 'skill_python'})
MATCH (ml:Skill {id: 'skill_ml'})
CREATE (ds)-[:REQUIRES {importance: 'must', weight: 0.9}]->(python)
CREATE (ds)-[:REQUIRES {importance: 'must', weight: 0.95}]->(ml)
5. GraphRAG Integration
Query Flow
┌─────────────────────────────────────────────────────────────────────┐
│ GRAPHRAG QUERY PIPELINE │
└─────────────────────────────────────────────────────────────────────┘
User: "What skills should a Python developer learn to become a
Machine Learning Engineer?"
STEP 1: INTENT CLASSIFICATION
─────────────────────────────
Intent: SKILLS_GAP_ANALYSIS
Entities:
- Source Role: "Python Developer"
- Target Role: "Machine Learning Engineer"
STEP 2: GRAPH QUERY GENERATION
─────────────────────────────
Query Plan:
1. Find skills for "Python Developer"
2. Find skills for "Machine Learning Engineer"
3. Calculate skill gap (difference)
4. Find learning paths for gap skills
Cypher Query:
```cypher
// Get source profession skills
MATCH (source:Profession {name: 'Python Developer'})-[:REQUIRES]->(s1:Skill)
WITH collect(s1) as sourceSkills
// Get target profession skills
MATCH (target:Profession {name: 'ML Engineer'})-[:REQUIRES]->(s2:Skill)
WITH sourceSkills, collect(s2) as targetSkills
// Find gap skills (in target but not in source)
WITH [s IN targetSkills WHERE NOT s IN sourceSkills] as gapSkills
UNWIND gapSkills as gap
// Find learning resources and prerequisites
OPTIONAL MATCH (gap)-[:TAUGHT_BY]->(resource:LearningResource)
OPTIONAL MATCH (prereq:Skill)-[:PREREQUISITE_FOR]->(gap)
RETURN gap, collect(DISTINCT resource) as resources,
collect(DISTINCT prereq) as prerequisites
STEP 3: CONTEXT ASSEMBLY ───────────────────────────── Graph Context:
- Gap Skills: [TensorFlow, PyTorch, Deep Learning, MLOps]
- Prerequisites: [Linear Algebra, Statistics, Python Advanced]
- Learning Resources: [Coursera ML Course, Fast.ai, …]
Vector Context (from job descriptions):
- “ML Engineers typically need 2+ years of deep learning…”
- “Key responsibilities include model deployment…”
STEP 4: LLM RESPONSE GENERATION ───────────────────────────── Prompt: “”” Based on the skills knowledge graph and job market data:
Current Role: Python Developer Target Role: Machine Learning Engineer
Skills Gap Analysis: {graph_context}
Additional Context: {vector_context}
Generate a personalized learning roadmap with:
- Priority skills to learn
- Recommended learning order
- Estimated time for each skill
- Specific resources “””
STEP 5: STRUCTURED RESPONSE ───────────────────────────── { “summary”: “To transition from Python Developer to ML Engineer…”, “skills_gap”: […], “learning_roadmap”: […], “estimated_duration”: “6-12 months”, “confidence”: 0.89 }
### Query Processing Flow
> **Interview context**: The core of GraphRAG is the query processing pipeline. Understand each step.
```mermaid
flowchart TB
Query["User Query:<br/>'What skills do I need to become a Machine Learning Engineer?'"]
Query --> Step1["Step 1: INTENT CLASSIFICATION<br/>Input: Natural language query<br/>Output: SKILLS_GAP | CAREER_PATH | RECOMMENDATIONS | ...<br/>Method: LLM prompt or fine-tuned classifier"]
Step1 --> Step2["Step 2: ENTITY EXTRACTION<br/>Input: Query text<br/>Output: {skills: [...], professions: [...]}<br/>Method: NER or LLM extraction"]
Step2 --> Step3["Step 3: GRAPH QUERY GENERATION<br/>Based on intent, generate Cypher query<br/>SKILLS_GAP → Compare profession skill sets<br/>CAREER_PATH → Traverse TRANSITIONS_TO relationships"]
Step3 --> Step4["Step 4: CONTEXT ASSEMBLY<br/>Combine: Graph + Vector + User context<br/>Format into prompt for LLM"]
Step4 --> Step5["Step 5: LLM RESPONSE GENERATION<br/>LLM receives: Query + Intent + Context<br/>LLM produces: Structured answer"]
Key Graph Queries
| Intent | What It Does | Graph Pattern |
|---|---|---|
| Skills Gap | Compare skills between roles | (CurrentRole)-[:REQUIRES]->() vs (TargetRole)-[:REQUIRES]->() |
| Career Path | Find progression routes | (Start)-[:TRANSITIONS_TO*1..4]->(End) |
| Recommendations | Suggest related skills | (UserSkill)-[:SIMILAR_TO|RELATED_TO]->(Recommended) |
| Profession Skills | List required skills | (Profession)-[:REQUIRES]->(Skill) |
Intent Classification
The system classifies queries into intents to choose the right graph traversal:
| Intent | Example Query | Graph Operation |
|---|---|---|
SKILLS_GAP |
“What skills do I need for X?” | Compare skill sets |
CAREER_PATH |
“How do I become X?” | Find path between roles |
SKILL_RECOMMENDATIONS |
“What should I learn next?” | Find related skills |
PROFESSION_SKILLS |
“What skills does X need?” | Get required skills |
SKILL_COMPARISON |
“How do X and Y compare?” | Similarity calculation |
MARKET_TRENDS |
“What skills are in demand?” | Aggregate trend data |
LEARNING_PATH |
“How do I learn X?” | Find prerequisites |
6. Use Cases & Query Patterns
Use Case 1: Skills Gap Analysis
USER: "I'm a Python developer. What skills do I need to become a
Machine Learning Engineer?"
GRAPH TRAVERSAL:
┌─────────────────┐ ┌─────────────────┐
│Python Developer │────>│ ML Engineer │
└───────┬─────────┘ └────────┬────────┘
│ REQUIRES │ REQUIRES
▼ ▼
┌─────────┐ ┌──────────────┐
│ Python │ │ TensorFlow │ ← GAP
│ Git │ │ PyTorch │ ← GAP
│ SQL │ │ Deep Learning│ ← GAP
│ REST API│ │ MLOps │ ← GAP
└─────────┘ │ Python │ ← HAVE
│ Statistics │ ← GAP
└──────────────┘
RESPONSE:
{
"skills_gap": [
{"skill": "TensorFlow", "priority": "high", "learning_time": "2 months"},
{"skill": "Deep Learning", "priority": "high", "learning_time": "3 months"},
{"skill": "PyTorch", "priority": "medium", "learning_time": "2 months"},
{"skill": "Statistics", "priority": "high", "learning_time": "1 month"}
],
"recommended_order": ["Statistics", "Deep Learning", "TensorFlow", "PyTorch"],
"total_time": "6-9 months"
}
Use Case 2: Career Path Discovery
USER: "What are career paths for a Frontend Developer?"
GRAPH TRAVERSAL:
┌─────────────────────┐
│ Frontend Developer │
└──────────┬──────────┘
│ TRANSITIONS_TO
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Senior Frontend │ │ Full Stack Dev │ │ UX Engineer │
│ Developer │ │ │ │ │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend Arch │ │ Tech Lead │ │ Design Systems │
└────────┬────────┘ └────────┬────────┘ │ Lead │
│ │ └─────────────────┘
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Engineering │ │ CTO │
│ Manager │ │ │
└─────────────────┘ └─────────────────┘
RESPONSE:
{
"career_paths": [
{
"path": ["Frontend Dev", "Senior Frontend", "Frontend Architect"],
"focus": "Technical depth",
"difficulty": "medium"
},
{
"path": ["Frontend Dev", "Full Stack", "Tech Lead", "CTO"],
"focus": "Leadership",
"difficulty": "hard"
},
{
"path": ["Frontend Dev", "UX Engineer", "Design Systems Lead"],
"focus": "Design + Code",
"difficulty": "medium"
}
]
}
Use Case 3: Skill Similarity & Recommendations
USER: "I know Python and SQL. What related skills should I learn?"
GRAPH TRAVERSAL:
┌────────────────────────────────────┐
│ User's Skills │
│ [Python, SQL] │
└──────────────┬─────────────────────┘
│ SIMILAR_TO / RELATED_TO
┌───────────────────┼───────────────────────┐
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐
│ R │ │ Pandas │ │PostgreSQL│
│(0.65) │ │ (0.85) │ │ (0.90) │
└────────┘ └──────────┘ └──────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐
│Statistics│ │ NumPy │ │ NoSQL │
│ (0.70) │ │ (0.82) │ │ (0.75) │
└─────────┘ └──────────┘ └──────────┘
RESPONSE:
{
"recommendations": [
{"skill": "Pandas", "similarity": 0.85, "reason": "Essential for Python data work"},
{"skill": "PostgreSQL", "similarity": 0.90, "reason": "Advanced SQL database"},
{"skill": "NumPy", "similarity": 0.82, "reason": "Foundation for data science"},
{"skill": "Statistics", "similarity": 0.70, "reason": "Complements data skills"}
]
}
Use Case 4: Job Description Generation
USER: "Generate required skills for a Senior DevOps Engineer position"
GRAPH QUERY:
MATCH (p:Profession {name: 'Senior DevOps Engineer'})-[r:REQUIRES]->(s:Skill)
RETURN s.name, r.importance, r.weight
ORDER BY r.weight DESC
RESPONSE:
{
"required_skills": {
"must_have": [
"Kubernetes", "Docker", "AWS/GCP/Azure",
"CI/CD", "Linux", "Terraform"
],
"nice_to_have": [
"Python/Go", "Prometheus", "GitOps",
"Security", "Networking"
]
},
"generated_description": "We're looking for a Senior DevOps Engineer..."
}
7. Data Pipeline
Data Ingestion Architecture
flowchart TB
subgraph Sources["DATA SOURCES"]
TK["Textkernel Skills API"]
JB["Job Boards (Indeed, etc)"]
RS["Resumes (User Data)"]
CR["Courses (Coursera)"]
end
subgraph Collectors["DATA COLLECTORS"]
API["API Client"]
Scraper["Scraper"]
Parser["File Parser"]
end
TK --> API
JB --> Scraper
RS --> Parser
CR --> API
subgraph Processing["DATA PROCESSING"]
EE["Entity Extraction (LLM)"]
RE["Relation Extraction (LLM)"]
DV["Data Validation"]
end
Collectors --> Processing
subgraph GraphConstruction["GRAPH CONSTRUCTION"]
ER["Entity Resolver (Dedup/Match)"]
RM["Relation Merger"]
GW["Graph Writer (Neo4j)"]
end
Processing --> GraphConstruction
GraphConstruction --> KG["Skills Knowledge Graph"]
Entity Extraction Pipeline
Interview context: The entity extraction pipeline converts unstructured data (job descriptions, resumes) into structured graph data. This is where LLMs add significant value.
Key extraction steps:
- Job Description Processing
- Extract job title, required skills, responsibilities
- Classify skill importance (“must-have” vs “nice-to-have”)
- Identify certification requirements and experience levels
- Resume Processing
- Extract skills with proficiency levels
- Map experience to skills used
- Identify certifications and education
- Entity Resolution
- Match extracted skills to canonical taxonomy (Textkernel)
- Handle synonyms: “JS” → “JavaScript”, “ML” → “Machine Learning”
- Merge duplicates with confidence scoring
- Similarity Computation
- Co-occurrence analysis: skills appearing together in job postings
- Jaccard similarity for relationship strength
- Threshold filtering (similarity > 0.3) to avoid noise
Interviewer might ask: “Why use LLMs for extraction instead of NER models?”
LLMs handle the nuance of skill descriptions better. “3+ years Python experience” vs “Python preferred” encode different requirements. NER would need extensive training data to capture these distinctions.
8. Scalability & Performance
Interview context: At scale, the main bottlenecks are graph queries, LLM calls, and entity extraction. Each requires different optimization strategies.
Caching Strategy
Multi-level caching is essential for production GraphRAG systems:
| Cache Level | What to Cache | TTL | Why |
|---|---|---|---|
| L1 (In-memory) | Hot queries | Session | Instant access for repeated queries |
| L2 (Redis) | Graph query results | 1 hour | Graph data changes slowly |
| L3 (Redis) | LLM responses | 24 hours | Expensive to regenerate |
| L4 (Persistent) | Entity extractions | 7 days | Job descriptions don’t change |
Cache key strategy: Hash the query + context to generate deterministic keys. Normalize queries before hashing to improve cache hit rates.
Interviewer might ask: “What about cache invalidation?”
For skills data, time-based expiration usually suffices—the graph updates gradually. For user-specific queries (like skills gap analysis), we use user_id + timestamp in the cache key to ensure fresh results after profile updates.
Query Optimization
Key optimization strategies:
- Indexing: Composite indexes on frequently queried relationship properties
- Clustering: Pre-compute skill clusters using community detection (Louvain algorithm)
- Caching hot paths: Career path queries that traverse multiple hops
Interviewer might ask: “How do you optimize graph queries at scale?”
Three levels: (1) Proper indexing on node/relationship properties, (2) Pre-computed aggregations for expensive traversals, (3) Query result caching for repeated patterns. The goal is sub-100ms response times even for complex multi-hop queries.
9. Advanced Features
9.1 Trend Analysis
Growth rate computation identifies rising and declining skills:
- Compare skill mentions in recent vs older job postings (30-day vs 90-day windows)
- Classify as “rising” (>20% growth), “declining” (<-20%), or “stable”
- Store growth_rate and trend as node properties for fast filtering
Use cases:
- Surface “Skills to Watch” in career planning
- Alert L&D teams about emerging skill gaps
- Inform curriculum development priorities
9.2 Personalized Recommendations
Recommendation scoring formula:
score = (goal_relevance × 0.4) + (growth_rate × 0.3) + (popularity × 0.3)
| Factor | Weight | Why |
|---|---|---|
| Goal relevance | 40% | Skills that lead to user’s career goals |
| Growth rate | 30% | Skills with increasing market demand |
| Popularity | 30% | Skills with established job market presence |
Interviewer might ask: “How do you personalize recommendations?”
We build a user context from their profile (current skills, experience, goals, interests) and use it to filter and rank the graph traversal results. The LLM then explains WHY each recommendation matters for this specific user.
9.3 Natural Language to Cypher (NL2Cypher)
The challenge: Convert questions like “What skills should I learn for machine learning?” into graph queries.
Approach:
- Provide graph schema to LLM as context
- Use few-shot examples of NL → Cypher conversions
- Validate generated Cypher syntax before execution
- Fall back to semantic search if Cypher generation fails
Schema provided to LLM:
- Node types: Skill, Profession, Category, Certification, LearningResource
- Relationships: REQUIRES, SIMILAR_TO, PREREQUISITE_FOR, BELONGS_TO, VALIDATED_BY, TAUGHT_BY, TRANSITIONS_TO
Interviewer might ask: “What if the LLM generates invalid Cypher?”
We validate syntax before execution and have fallback strategies: (1) retry with rephrased prompt, (2) fall back to predefined query templates, (3) use semantic search as last resort. Always return something useful rather than failing.
Summary
┌─────────────────────────────────────────────────────────────────────┐
│ SKILLS GRAPHRAG SYSTEM SUMMARY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ CORE COMPONENTS │
│ ├── Skills Knowledge Graph (Neo4j) │
│ ├── Entity Extraction Pipeline (LLM-based) │
│ ├── GraphRAG Query Engine │
│ ├── Response Generator (GPT-4) │
│ └── Caching Layer (Redis) │
│ │
│ KEY FEATURES │
│ ├── Natural language queries about skills │
│ ├── Skills gap analysis │
│ ├── Career path recommendations │
│ ├── Personalized skill recommendations │
│ ├── Trend analysis and forecasting │
│ └── Integration with Textkernel ontology │
│ │
│ DATA SOURCES │
│ ├── Textkernel Skills Intelligence API │
│ ├── Job descriptions (scraped/API) │
│ ├── Resumes/CVs │
│ └── Learning platforms (Coursera, etc.) │
│ │
│ USE CASES │
│ ├── HR/Recruiting: Candidate matching │
│ ├── L&D: Learning recommendations │
│ ├── Workforce Planning: Skill forecasting │
│ └── Career Development: Path planning │
│ │
└─────────────────────────────────────────────────────────────────────┘
Interview Tips
Key Points to Emphasize
1. WHY GRAPH FOR SKILLS?
"Skills have relationships - prerequisites, similarity, career paths.
A graph captures these explicitly while vector search only finds
semantic similarity."
2. GRAPHRAG ADVANTAGES
"We combine graph traversal (explicit relationships) with vector
search (semantic similarity) for best of both worlds."
3. DOMAIN-SPECIFIC MODELING
"Skills ontology is different from general knowledge graphs.
It has specific relationships: REQUIRES, TRANSITIONS_TO,
PREREQUISITE_FOR, SIMILAR_TO."
Common Follow-up Questions
| Question | Key Points |
|---|---|
| “Why not just vector search?” | Misses explicit relationships, can’t traverse career paths |
| “How do you keep the graph updated?” | Data pipeline from job boards, manual curation, trend analysis |
| “How do you handle synonyms?” | Multiple labels per skill, similarity relationships, LLM normalization |
| “Scalability concerns?” | Graph database scaling (Neo4j clusters), caching hot paths |
| “How accurate are recommendations?” | Feedback loops, A/B testing, domain expert validation |
Trade-offs to Discuss
| Decision | Option A | Option B |
|---|---|---|
| Graph DB | Neo4j (mature, Cypher) | Amazon Neptune (managed, SPARQL) |
| Embeddings | Pre-computed (fast) | On-demand (flexible) |
| Entity extraction | LLM (flexible) | NER model (faster) |
| Response generation | LLM (natural) | Template-based (predictable) |