System Design: Skills Intelligence GraphRAG

Combining Knowledge Graphs with Skills Ontology for Intelligent Talent Systems

Interview context: This is a specialized system design combining GraphRAG (graph-based retrieval augmented generation) with HR/talent domain knowledge. It demonstrates how to build intelligent systems that understand relationships between skills, professions, and career paths.

Problem Understanding
Skills Ontology Overview
System Architecture
Knowledge Graph Design
GraphRAG Integration
Use Cases & Query Patterns
Data Pipeline
Implementation Details
Scalability & Performance
Advanced Features

1. Problem Understanding

What Are We Building?

A Skills Intelligence GraphRAG System that combines:

Skills Ontology (Textkernel-style): Structured knowledge about skills, professions, and their relationships
GraphRAG: Graph-based retrieval for contextual, relationship-aware answers
LLM Integration: Natural language interface for complex talent queries

Target Use Cases

┌─────────────────────────────────────────────────────────────────────┐
│                      PRIMARY USE CASES                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. TALENT MATCHING                                                 │
│     "Find candidates with Python skills who could transition       │
│      to Machine Learning Engineer roles"                           │
│                                                                     │
│  2. SKILLS GAP ANALYSIS                                            │
│     "What skills does John need to become a Data Scientist?"       │
│                                                                     │
│  3. CAREER PATH RECOMMENDATIONS                                    │
│     "What are potential career paths for a Frontend Developer?"    │
│                                                                     │
│  4. LEARNING RECOMMENDATIONS                                       │
│     "What courses should I take to move into Cloud Architecture?"  │
│                                                                     │
│  5. JOB DESCRIPTION GENERATION                                     │
│     "Generate required skills for a Senior DevOps Engineer"        │
│                                                                     │
│  6. WORKFORCE PLANNING                                             │
│     "What skills will our team need in 2 years given AI trends?"   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Why GraphRAG for Skills?

Interviewer might ask: “Why not just use vector search?”

Approach	Pros	Cons
Vector Search	Semantic similarity, handles synonyms	Misses explicit relationships
Keyword Search	Fast, exact matches	No understanding of context
GraphRAG	Relationship-aware, traverses connections	More complex to build

GraphRAG advantage: When asked “Python developer skills”, it can traverse:

Python → [related_to] → Data Analysis, ML, Web Dev
Python Developer → [requires] → Python, Git, SQL
Python → [prerequisite_for] → TensorFlow, Django

This relationship awareness enables career path analysis, skills gap detection, and intelligent recommendations that pure vector search cannot provide.

2. Skills Ontology Overview

Textkernel Skills Intelligence Structure

Based on the Textkernel Ontology API:

┌─────────────────────────────────────────────────────────────────────┐
│                    SKILLS ONTOLOGY ENTITIES                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  SKILLS                           PROFESSIONS                       │
│  ├── Skill ID                     ├── Profession ID                │
│  ├── Name (multi-language)        ├── Name (multi-language)        │
│  ├── Category                     ├── Industry/Domain              │
│  ├── Type (Hard/Soft/Tool)        ├── Seniority Levels            │
│  └── Certifications               └── Required Skills              │
│                                                                     │
│  RELATIONSHIPS                                                      │
│  ├── Profession → Skills (has_skill)                               │
│  ├── Skills → Professions (enables_profession)                     │
│  ├── Skill → Skill (similar_to, prerequisite_of, related_to)      │
│  └── Skill → Certification (validates)                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

API Endpoints (Textkernel Reference)

Endpoint	Purpose
`/professions/suggest_skills`	Get skills for a profession
`/professions/compare_skills`	Compare skills between professions
`/skills/suggest_professions`	Get professions for skills
`/skills/compare_to_profession`	Compare skill set to profession
`/skills/suggest_skills`	Get similar/related skills
`/skills/similarity_score`	Calculate skill set similarity

3. System Architecture

High-Level Architecture

flowchart TB
    Query["User Query (Natural Lang)"]

    subgraph QueryProcessor["QUERY PROCESSOR"]
        IC["Intent Classifier"]
        EE["Entity Extractor"]
        QP["Query Planner"]
    end

    Query --> QueryProcessor

    QueryProcessor --> SG["Skills Graph (Neo4j)"]
    QueryProcessor --> VS["Vector Store (Embeddings)"]
    QueryProcessor --> DS["Document Store (Raw Data)"]

    subgraph ContextAssembler["CONTEXT ASSEMBLER"]
        GC["Graph Context"]
        VC["Vector Context"]
        HR["Hybrid Ranker"]
    end

    SG --> ContextAssembler
    VS --> ContextAssembler
    DS --> ContextAssembler

    subgraph LLMGenerator["LLM RESPONSE GENERATOR"]
        PB["Prompt Builder"]
        LLM["LLM (GPT-4)"]
        RF["Response Formatter"]
    end

    ContextAssembler --> LLMGenerator

    LLMGenerator --> Response["Response<br/>(Structured + Explanation)"]

Component Overview

Component	Purpose	Technology
Query Processor	Parse & understand user queries	LLM + NER
Skills Graph	Store skills ontology	Neo4j / Amazon Neptune
Vector Store	Semantic similarity search	Pinecone / Weaviate
Document Store	Raw job descriptions, resumes	Elasticsearch / S3
Context Assembler	Combine graph + vector results	Custom logic
LLM Generator	Generate natural language responses	GPT-4 / Claude

4. Knowledge Graph Design

Graph Schema

┌─────────────────────────────────────────────────────────────────────┐
│                      SKILLS KNOWLEDGE GRAPH                         │
└─────────────────────────────────────────────────────────────────────┘

NODE TYPES:
═══════════

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│     SKILL       │     │   PROFESSION    │     │    CATEGORY     │
├─────────────────┤     ├─────────────────┤     ├─────────────────┤
│ id: string      │     │ id: string      │     │ id: string      │
│ name: string    │     │ name: string    │     │ name: string    │
│ type: enum      │     │ industry: string│     │ parent_id: str  │
│ description: str│     │ seniority: enum │     │ level: int      │
│ popularity: int │     │ avg_salary: int │     └─────────────────┘
│ growth_rate: flt│     │ demand_trend: st│
└─────────────────┘     └─────────────────┘

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  CERTIFICATION  │     │    INDUSTRY     │     │    LEARNING     │
├─────────────────┤     ├─────────────────┤     │    RESOURCE     │
│ id: string      │     │ id: string      │     ├─────────────────┤
│ name: string    │     │ name: string    │     │ id: string      │
│ provider: string│     │ growth: float   │     │ title: string   │
│ validity: int   │     │ size: int       │     │ type: enum      │
│ level: enum     │     └─────────────────┘     │ duration: int   │
└─────────────────┘                             │ provider: string│
                                                └─────────────────┘

RELATIONSHIP TYPES:
══════════════════

SKILL ──[SIMILAR_TO {score: 0.0-1.0}]──> SKILL
SKILL ──[PREREQUISITE_FOR]──> SKILL
SKILL ──[RELATED_TO {type: "complementary"|"alternative"}]──> SKILL
SKILL ──[BELONGS_TO]──> CATEGORY
SKILL ──[VALIDATED_BY]──> CERTIFICATION
SKILL ──[TAUGHT_BY]──> LEARNING_RESOURCE

PROFESSION ──[REQUIRES {importance: "must"|"nice"}]──> SKILL
PROFESSION ──[TRANSITIONS_TO {difficulty: 1-5}]──> PROFESSION
PROFESSION ──[BELONGS_TO]──> INDUSTRY
PROFESSION ──[SENIOR_VERSION_OF]──> PROFESSION

CATEGORY ──[PARENT_OF]──> CATEGORY

Example Graph Data

flowchart TB
    Programming["CATEGORY: Programming"]

    Programming -->|PARENT_OF| Backend["Backend Development"]
    Programming -->|PARENT_OF| Frontend["Frontend Development"]
    Programming -->|PARENT_OF| DataSci["Data Science"]

    Backend --> Python["Python (Skill)"]
    Frontend --> JavaScript["JavaScript (Skill)"]
    DataSci --> Python2["Python (Skill)"]

    Python --> Django["Django (Skill)"]
    Python --> FastAPI["FastAPI (Skill)"]
    JavaScript --> React["React (Skill)"]
    Python2 --> Pandas["Pandas (Skill)"]

    Django -->|VALIDATED_BY| DjangoCert["Django Certification"]

Neo4j Schema (Cypher)

// Create constraints
CREATE CONSTRAINT skill_id IF NOT EXISTS FOR (s:Skill) REQUIRE s.id IS UNIQUE;
CREATE CONSTRAINT profession_id IF NOT EXISTS FOR (p:Profession) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT category_id IF NOT EXISTS FOR (c:Category) REQUIRE c.id IS UNIQUE;

// Create indexes for search
CREATE INDEX skill_name IF NOT EXISTS FOR (s:Skill) ON (s.name);
CREATE INDEX profession_name IF NOT EXISTS FOR (p:Profession) ON (p.name);
CREATE FULLTEXT INDEX skill_search IF NOT EXISTS FOR (s:Skill) ON EACH [s.name, s.description];

// Example: Create skills
CREATE (python:Skill {
    id: 'skill_python',
    name: 'Python',
    type: 'hard_skill',
    description: 'General-purpose programming language',
    popularity: 95,
    growth_rate: 0.15
})

CREATE (ml:Skill {
    id: 'skill_ml',
    name: 'Machine Learning',
    type: 'hard_skill',
    description: 'Building systems that learn from data',
    popularity: 88,
    growth_rate: 0.25
})

// Example: Create relationships
MATCH (python:Skill {id: 'skill_python'})
MATCH (ml:Skill {id: 'skill_ml'})
CREATE (python)-[:PREREQUISITE_FOR]->(ml)
CREATE (python)-[:SIMILAR_TO {score: 0.7}]->(ml)

// Example: Profession with required skills
CREATE (ds:Profession {
    id: 'prof_data_scientist',
    name: 'Data Scientist',
    industry: 'Technology',
    seniority: 'mid',
    avg_salary: 120000
})

MATCH (ds:Profession {id: 'prof_data_scientist'})
MATCH (python:Skill {id: 'skill_python'})
MATCH (ml:Skill {id: 'skill_ml'})
CREATE (ds)-[:REQUIRES {importance: 'must', weight: 0.9}]->(python)
CREATE (ds)-[:REQUIRES {importance: 'must', weight: 0.95}]->(ml)

5. GraphRAG Integration

Query Flow

┌─────────────────────────────────────────────────────────────────────┐
│                    GRAPHRAG QUERY PIPELINE                          │
└─────────────────────────────────────────────────────────────────────┘

User: "What skills should a Python developer learn to become a
       Machine Learning Engineer?"

STEP 1: INTENT CLASSIFICATION
─────────────────────────────
Intent: SKILLS_GAP_ANALYSIS
Entities:
  - Source Role: "Python Developer"
  - Target Role: "Machine Learning Engineer"

STEP 2: GRAPH QUERY GENERATION
─────────────────────────────
Query Plan:
  1. Find skills for "Python Developer"
  2. Find skills for "Machine Learning Engineer"
  3. Calculate skill gap (difference)
  4. Find learning paths for gap skills

Cypher Query:
```cypher
// Get source profession skills
MATCH (source:Profession {name: 'Python Developer'})-[:REQUIRES]->(s1:Skill)
WITH collect(s1) as sourceSkills

// Get target profession skills
MATCH (target:Profession {name: 'ML Engineer'})-[:REQUIRES]->(s2:Skill)
WITH sourceSkills, collect(s2) as targetSkills

// Find gap skills (in target but not in source)
WITH [s IN targetSkills WHERE NOT s IN sourceSkills] as gapSkills
UNWIND gapSkills as gap

// Find learning resources and prerequisites
OPTIONAL MATCH (gap)-[:TAUGHT_BY]->(resource:LearningResource)
OPTIONAL MATCH (prereq:Skill)-[:PREREQUISITE_FOR]->(gap)

RETURN gap, collect(DISTINCT resource) as resources,
       collect(DISTINCT prereq) as prerequisites

STEP 3: CONTEXT ASSEMBLY ───────────────────────────── Graph Context:

Gap Skills: [TensorFlow, PyTorch, Deep Learning, MLOps]
Prerequisites: [Linear Algebra, Statistics, Python Advanced]
Learning Resources: [Coursera ML Course, Fast.ai, …]

Vector Context (from job descriptions):

“ML Engineers typically need 2+ years of deep learning…”
“Key responsibilities include model deployment…”

STEP 4: LLM RESPONSE GENERATION ───────────────────────────── Prompt: “”” Based on the skills knowledge graph and job market data:

Current Role: Python Developer Target Role: Machine Learning Engineer

Skills Gap Analysis: {graph_context}

Additional Context: {vector_context}

Generate a personalized learning roadmap with:

Priority skills to learn
Recommended learning order
Estimated time for each skill
Specific resources “””

STEP 5: STRUCTURED RESPONSE ───────────────────────────── { “summary”: “To transition from Python Developer to ML Engineer…”, “skills_gap”: […], “learning_roadmap”: […], “estimated_duration”: “6-12 months”, “confidence”: 0.89 }

### Query Processing Flow

> **Interview context**: The core of GraphRAG is the query processing pipeline. Understand each step.

```mermaid
flowchart TB
    Query["User Query:<br/>'What skills do I need to become a Machine Learning Engineer?'"]

    Query --> Step1["Step 1: INTENT CLASSIFICATION<br/>Input: Natural language query<br/>Output: SKILLS_GAP | CAREER_PATH | RECOMMENDATIONS | ...<br/>Method: LLM prompt or fine-tuned classifier"]

    Step1 --> Step2["Step 2: ENTITY EXTRACTION<br/>Input: Query text<br/>Output: {skills: [...], professions: [...]}<br/>Method: NER or LLM extraction"]

    Step2 --> Step3["Step 3: GRAPH QUERY GENERATION<br/>Based on intent, generate Cypher query<br/>SKILLS_GAP → Compare profession skill sets<br/>CAREER_PATH → Traverse TRANSITIONS_TO relationships"]

    Step3 --> Step4["Step 4: CONTEXT ASSEMBLY<br/>Combine: Graph + Vector + User context<br/>Format into prompt for LLM"]

    Step4 --> Step5["Step 5: LLM RESPONSE GENERATION<br/>LLM receives: Query + Intent + Context<br/>LLM produces: Structured answer"]

Key Graph Queries

Intent	What It Does	Graph Pattern
Skills Gap	Compare skills between roles	`(CurrentRole)-[:REQUIRES]->() vs (TargetRole)-[:REQUIRES]->()`
Career Path	Find progression routes	`(Start)-[:TRANSITIONS_TO*1..4]->(End)`
Recommendations	Suggest related skills	`(UserSkill)-[:SIMILAR_TO\|RELATED_TO]->(Recommended)`
Profession Skills	List required skills	`(Profession)-[:REQUIRES]->(Skill)`

Intent Classification

The system classifies queries into intents to choose the right graph traversal:

Intent	Example Query	Graph Operation
`SKILLS_GAP`	“What skills do I need for X?”	Compare skill sets
`CAREER_PATH`	“How do I become X?”	Find path between roles
`SKILL_RECOMMENDATIONS`	“What should I learn next?”	Find related skills
`PROFESSION_SKILLS`	“What skills does X need?”	Get required skills
`SKILL_COMPARISON`	“How do X and Y compare?”	Similarity calculation
`MARKET_TRENDS`	“What skills are in demand?”	Aggregate trend data
`LEARNING_PATH`	“How do I learn X?”	Find prerequisites

6. Use Cases & Query Patterns

Use Case 1: Skills Gap Analysis

USER: "I'm a Python developer. What skills do I need to become a
       Machine Learning Engineer?"

GRAPH TRAVERSAL:
┌─────────────────┐     ┌─────────────────┐
│Python Developer │────>│   ML Engineer   │
└───────┬─────────┘     └────────┬────────┘
        │ REQUIRES               │ REQUIRES
        ▼                        ▼
   ┌─────────┐            ┌──────────────┐
   │ Python  │            │ TensorFlow   │ ← GAP
   │ Git     │            │ PyTorch      │ ← GAP
   │ SQL     │            │ Deep Learning│ ← GAP
   │ REST API│            │ MLOps        │ ← GAP
   └─────────┘            │ Python       │ ← HAVE
                          │ Statistics   │ ← GAP
                          └──────────────┘

RESPONSE:
{
  "skills_gap": [
    {"skill": "TensorFlow", "priority": "high", "learning_time": "2 months"},
    {"skill": "Deep Learning", "priority": "high", "learning_time": "3 months"},
    {"skill": "PyTorch", "priority": "medium", "learning_time": "2 months"},
    {"skill": "Statistics", "priority": "high", "learning_time": "1 month"}
  ],
  "recommended_order": ["Statistics", "Deep Learning", "TensorFlow", "PyTorch"],
  "total_time": "6-9 months"
}

Use Case 2: Career Path Discovery

USER: "What are career paths for a Frontend Developer?"

GRAPH TRAVERSAL:
                    ┌─────────────────────┐
                    │ Frontend Developer  │
                    └──────────┬──────────┘
                               │ TRANSITIONS_TO
         ┌─────────────────────┼─────────────────────┐
         ▼                     ▼                     ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ Senior Frontend │  │ Full Stack Dev  │  │   UX Engineer   │
│   Developer     │  │                 │  │                 │
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ Frontend Arch   │  │  Tech Lead      │  │ Design Systems  │
└────────┬────────┘  └────────┬────────┘  │    Lead         │
         │                    │           └─────────────────┘
         ▼                    ▼
┌─────────────────┐  ┌─────────────────┐
│ Engineering     │  │    CTO          │
│   Manager       │  │                 │
└─────────────────┘  └─────────────────┘

RESPONSE:
{
  "career_paths": [
    {
      "path": ["Frontend Dev", "Senior Frontend", "Frontend Architect"],
      "focus": "Technical depth",
      "difficulty": "medium"
    },
    {
      "path": ["Frontend Dev", "Full Stack", "Tech Lead", "CTO"],
      "focus": "Leadership",
      "difficulty": "hard"
    },
    {
      "path": ["Frontend Dev", "UX Engineer", "Design Systems Lead"],
      "focus": "Design + Code",
      "difficulty": "medium"
    }
  ]
}

Use Case 3: Skill Similarity & Recommendations

USER: "I know Python and SQL. What related skills should I learn?"

GRAPH TRAVERSAL:
         ┌────────────────────────────────────┐
         │         User's Skills              │
         │      [Python, SQL]                 │
         └──────────────┬─────────────────────┘
                        │ SIMILAR_TO / RELATED_TO
    ┌───────────────────┼───────────────────────┐
    ▼                   ▼                       ▼
┌────────┐        ┌──────────┐           ┌──────────┐
│  R     │        │  Pandas  │           │PostgreSQL│
│(0.65)  │        │  (0.85)  │           │  (0.90)  │
└────────┘        └──────────┘           └──────────┘
    │                  │                      │
    ▼                  ▼                      ▼
┌────────┐        ┌──────────┐           ┌──────────┐
│Statistics│       │  NumPy   │           │  NoSQL   │
│ (0.70)  │       │  (0.82)  │           │  (0.75)  │
└─────────┘       └──────────┘           └──────────┘

RESPONSE:
{
  "recommendations": [
    {"skill": "Pandas", "similarity": 0.85, "reason": "Essential for Python data work"},
    {"skill": "PostgreSQL", "similarity": 0.90, "reason": "Advanced SQL database"},
    {"skill": "NumPy", "similarity": 0.82, "reason": "Foundation for data science"},
    {"skill": "Statistics", "similarity": 0.70, "reason": "Complements data skills"}
  ]
}

Use Case 4: Job Description Generation

USER: "Generate required skills for a Senior DevOps Engineer position"

GRAPH QUERY:
MATCH (p:Profession {name: 'Senior DevOps Engineer'})-[r:REQUIRES]->(s:Skill)
RETURN s.name, r.importance, r.weight
ORDER BY r.weight DESC

RESPONSE:
{
  "required_skills": {
    "must_have": [
      "Kubernetes", "Docker", "AWS/GCP/Azure",
      "CI/CD", "Linux", "Terraform"
    ],
    "nice_to_have": [
      "Python/Go", "Prometheus", "GitOps",
      "Security", "Networking"
    ]
  },
  "generated_description": "We're looking for a Senior DevOps Engineer..."
}

7. Data Pipeline

Data Ingestion Architecture

flowchart TB
    subgraph Sources["DATA SOURCES"]
        TK["Textkernel Skills API"]
        JB["Job Boards (Indeed, etc)"]
        RS["Resumes (User Data)"]
        CR["Courses (Coursera)"]
    end

    subgraph Collectors["DATA COLLECTORS"]
        API["API Client"]
        Scraper["Scraper"]
        Parser["File Parser"]
    end

    TK --> API
    JB --> Scraper
    RS --> Parser
    CR --> API

    subgraph Processing["DATA PROCESSING"]
        EE["Entity Extraction (LLM)"]
        RE["Relation Extraction (LLM)"]
        DV["Data Validation"]
    end

    Collectors --> Processing

    subgraph GraphConstruction["GRAPH CONSTRUCTION"]
        ER["Entity Resolver (Dedup/Match)"]
        RM["Relation Merger"]
        GW["Graph Writer (Neo4j)"]
    end

    Processing --> GraphConstruction

    GraphConstruction --> KG["Skills Knowledge Graph"]

Entity Extraction Pipeline

Interview context: The entity extraction pipeline converts unstructured data (job descriptions, resumes) into structured graph data. This is where LLMs add significant value.

Key extraction steps:

Job Description Processing
- Extract job title, required skills, responsibilities
- Classify skill importance (“must-have” vs “nice-to-have”)
- Identify certification requirements and experience levels
Resume Processing
- Extract skills with proficiency levels
- Map experience to skills used
- Identify certifications and education
Entity Resolution
- Match extracted skills to canonical taxonomy (Textkernel)
- Handle synonyms: “JS” → “JavaScript”, “ML” → “Machine Learning”
- Merge duplicates with confidence scoring
Similarity Computation
- Co-occurrence analysis: skills appearing together in job postings
- Jaccard similarity for relationship strength
- Threshold filtering (similarity > 0.3) to avoid noise

Interviewer might ask: “Why use LLMs for extraction instead of NER models?”

LLMs handle the nuance of skill descriptions better. “3+ years Python experience” vs “Python preferred” encode different requirements. NER would need extensive training data to capture these distinctions.

8. Scalability & Performance

Interview context: At scale, the main bottlenecks are graph queries, LLM calls, and entity extraction. Each requires different optimization strategies.

Caching Strategy

Multi-level caching is essential for production GraphRAG systems:

Cache Level	What to Cache	TTL	Why
L1 (In-memory)	Hot queries	Session	Instant access for repeated queries
L2 (Redis)	Graph query results	1 hour	Graph data changes slowly
L3 (Redis)	LLM responses	24 hours	Expensive to regenerate
L4 (Persistent)	Entity extractions	7 days	Job descriptions don’t change

Cache key strategy: Hash the query + context to generate deterministic keys. Normalize queries before hashing to improve cache hit rates.

Interviewer might ask: “What about cache invalidation?”

For skills data, time-based expiration usually suffices—the graph updates gradually. For user-specific queries (like skills gap analysis), we use user_id + timestamp in the cache key to ensure fresh results after profile updates.

Query Optimization

Key optimization strategies:

Indexing: Composite indexes on frequently queried relationship properties
Clustering: Pre-compute skill clusters using community detection (Louvain algorithm)
Caching hot paths: Career path queries that traverse multiple hops

Interviewer might ask: “How do you optimize graph queries at scale?”

Three levels: (1) Proper indexing on node/relationship properties, (2) Pre-computed aggregations for expensive traversals, (3) Query result caching for repeated patterns. The goal is sub-100ms response times even for complex multi-hop queries.

9. Advanced Features

9.1 Trend Analysis

Growth rate computation identifies rising and declining skills:

Compare skill mentions in recent vs older job postings (30-day vs 90-day windows)
Classify as “rising” (>20% growth), “declining” (<-20%), or “stable”
Store growth_rate and trend as node properties for fast filtering

Use cases:

Surface “Skills to Watch” in career planning
Alert L&D teams about emerging skill gaps
Inform curriculum development priorities

9.2 Personalized Recommendations

Recommendation scoring formula:

score = (goal_relevance × 0.4) + (growth_rate × 0.3) + (popularity × 0.3)

Factor	Weight	Why
Goal relevance	40%	Skills that lead to user’s career goals
Growth rate	30%	Skills with increasing market demand
Popularity	30%	Skills with established job market presence

Interviewer might ask: “How do you personalize recommendations?”

We build a user context from their profile (current skills, experience, goals, interests) and use it to filter and rank the graph traversal results. The LLM then explains WHY each recommendation matters for this specific user.

9.3 Natural Language to Cypher (NL2Cypher)

The challenge: Convert questions like “What skills should I learn for machine learning?” into graph queries.

Approach:

Provide graph schema to LLM as context
Use few-shot examples of NL → Cypher conversions
Validate generated Cypher syntax before execution
Fall back to semantic search if Cypher generation fails

Schema provided to LLM:

Node types: Skill, Profession, Category, Certification, LearningResource
Relationships: REQUIRES, SIMILAR_TO, PREREQUISITE_FOR, BELONGS_TO, VALIDATED_BY, TAUGHT_BY, TRANSITIONS_TO

Interviewer might ask: “What if the LLM generates invalid Cypher?”

We validate syntax before execution and have fallback strategies: (1) retry with rephrased prompt, (2) fall back to predefined query templates, (3) use semantic search as last resort. Always return something useful rather than failing.

Summary

┌─────────────────────────────────────────────────────────────────────┐
│              SKILLS GRAPHRAG SYSTEM SUMMARY                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  CORE COMPONENTS                                                    │
│  ├── Skills Knowledge Graph (Neo4j)                                │
│  ├── Entity Extraction Pipeline (LLM-based)                        │
│  ├── GraphRAG Query Engine                                         │
│  ├── Response Generator (GPT-4)                                    │
│  └── Caching Layer (Redis)                                         │
│                                                                     │
│  KEY FEATURES                                                       │
│  ├── Natural language queries about skills                         │
│  ├── Skills gap analysis                                           │
│  ├── Career path recommendations                                   │
│  ├── Personalized skill recommendations                            │
│  ├── Trend analysis and forecasting                               │
│  └── Integration with Textkernel ontology                          │
│                                                                     │
│  DATA SOURCES                                                       │
│  ├── Textkernel Skills Intelligence API                           │
│  ├── Job descriptions (scraped/API)                               │
│  ├── Resumes/CVs                                                   │
│  └── Learning platforms (Coursera, etc.)                          │
│                                                                     │
│  USE CASES                                                          │
│  ├── HR/Recruiting: Candidate matching                             │
│  ├── L&D: Learning recommendations                                 │
│  ├── Workforce Planning: Skill forecasting                        │
│  └── Career Development: Path planning                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Interview Tips

Key Points to Emphasize

1. WHY GRAPH FOR SKILLS?
   "Skills have relationships - prerequisites, similarity, career paths.
    A graph captures these explicitly while vector search only finds
    semantic similarity."

2. GRAPHRAG ADVANTAGES
   "We combine graph traversal (explicit relationships) with vector
    search (semantic similarity) for best of both worlds."

3. DOMAIN-SPECIFIC MODELING
   "Skills ontology is different from general knowledge graphs.
    It has specific relationships: REQUIRES, TRANSITIONS_TO,
    PREREQUISITE_FOR, SIMILAR_TO."

Common Follow-up Questions

Question	Key Points
“Why not just vector search?”	Misses explicit relationships, can’t traverse career paths
“How do you keep the graph updated?”	Data pipeline from job boards, manual curation, trend analysis
“How do you handle synonyms?”	Multiple labels per skill, similarity relationships, LLM normalization
“Scalability concerns?”	Graph database scaling (Neo4j clusters), caching hot paths
“How accurate are recommendations?”	Feedback loops, A/B testing, domain expert validation

Trade-offs to Discuss

Decision	Option A	Option B
Graph DB	Neo4j (mature, Cypher)	Amazon Neptune (managed, SPARQL)
Embeddings	Pre-computed (fast)	On-demand (flexible)
Entity extraction	LLM (flexible)	NER model (faster)
Response generation	LLM (natural)	Template-based (predictable)