Coding, Developing, Testing, and Interpreting Embedded Knowledge Graphs with PyKEEN

In this tutorial, we go through an end-to-end, advanced workflow for knowledge graph embedding PyKEENactively exploring how modern embedding models are trained, analyzed, developed, and interpreted in practice. We begin by understanding the structure of real-world graph datasets, then systematically train and compare multiple embedding models, tune their parameters, and analyze their performance using robust ranking metrics. Also, we focus not only on running pipelines but on building a sense of link prediction, negative sampling, and embedding geometry, making sure we understand why each step is important and how it affects downstream reasoning over graphs. Check it out FULL CODES here.
!pip install -q pykeen torch torchvision
import warnings
warnings.filterwarnings('ignore')
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple
from pykeen.pipeline import pipeline
from pykeen.datasets import Nations, FB15k237, get_dataset
from pykeen.models import TransE, ComplEx, RotatE, DistMult
from pykeen.training import SLCWATrainingLoop, LCWATrainingLoop
from pykeen.evaluation import RankBasedEvaluator
from pykeen.triples import TriplesFactory
from pykeen.hpo import hpo_pipeline
from pykeen.sampling import BasicNegativeSampler
from pykeen.losses import MarginRankingLoss, BCEWithLogitsLoss
from pykeen.trackers import ConsoleResultTracker
print("PyKEEN setup complete!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
Set up a complete testing environment by installing PyKEEN and its deep learning dependencies, and by importing all the libraries needed to model, test, visualize, and fully exploit. We ensure a clean, reproducible workflow by suppressing warnings and ensuring PyTorch and CUDA are optimized for efficient calculations. Check it out FULL CODES here.
print("n" + "="*80)
print("SECTION 2: Dataset Exploration")
print("="*80 + "n")
dataset = Nations()
print(f"Dataset: {dataset}")
print(f"Number of entities: {dataset.num_entities}")
print(f"Number of relations: {dataset.num_relations}")
print(f"Training triples: {dataset.training.num_triples}")
print(f"Testing triples: {dataset.testing.num_triples}")
print(f"Validation triples: {dataset.validation.num_triples}")
print("nSample triples (head, relation, tail):")
for i in range(5):
h, r, t = dataset.training.mapped_triples[i]
head = dataset.training.entity_id_to_label[h.item()]
rel = dataset.training.relation_id_to_label[r.item()]
tail = dataset.training.entity_id_to_label[t.item()]
print(f" {head} --[{rel}]--> {tail}")
def analyze_dataset(triples_factory: TriplesFactory) -> pd.DataFrame:
"""Compute basic statistics about the knowledge graph."""
stats = {
'Metric': [],
'Value': []
}
stats['Metric'].extend(['Entities', 'Relations', 'Triples'])
stats['Value'].extend([
triples_factory.num_entities,
triples_factory.num_relations,
triples_factory.num_triples
])
unique, counts = torch.unique(triples_factory.mapped_triples[:, 1], return_counts=True)
stats['Metric'].extend(['Avg triples per relation', 'Max triples for a relation'])
stats['Value'].extend([counts.float().mean().item(), counts.max().item()])
return pd.DataFrame(stats)
stats_df = analyze_dataset(dataset.training)
print("nDataset Statistics:")
print(stats_df.to_string(index=False))
We load and test the Nation information graph to understand its level, structure, and complexity of relationships before training any models. We examine triplicate samples to build a sense of how entities and relationships are represented internally using indexed maps. We then calculate core statistics such as the correlation coefficient and the triple distribution, which allow us to think about the minimality of the graph and the complexity of the modeling beforehand. Check it out FULL CODES here.
print("n" + "="*80)
print("SECTION 3: Training Multiple Models")
print("="*80 + "n")
models_config = {
'TransE': {
'model': 'TransE',
'model_kwargs': {'embedding_dim': 50},
'loss': 'MarginRankingLoss',
'loss_kwargs': {'margin': 1.0}
},
'ComplEx': {
'model': 'ComplEx',
'model_kwargs': {'embedding_dim': 50},
'loss': 'BCEWithLogitsLoss',
},
'RotatE': {
'model': 'RotatE',
'model_kwargs': {'embedding_dim': 50},
'loss': 'MarginRankingLoss',
'loss_kwargs': {'margin': 3.0}
}
}
training_config = {
'training_loop': 'sLCWA',
'negative_sampler': 'basic',
'negative_sampler_kwargs': {'num_negs_per_pos': 5},
'training_kwargs': {
'num_epochs': 100,
'batch_size': 128,
},
'optimizer': 'Adam',
'optimizer_kwargs': {'lr': 0.001}
}
results = {}
for model_name, config in models_config.items():
print(f"nTraining {model_name}...")
result = pipeline(
dataset=dataset,
model=config['model'],
model_kwargs=config.get('model_kwargs', {}),
loss=config.get('loss'),
loss_kwargs=config.get('loss_kwargs', {}),
**training_config,
random_seed=42,
device="cuda" if torch.cuda.is_available() else 'cpu'
)
results[model_name] = result
print(f"n{model_name} Results:")
print(f" MRR: {result.metric_results.get_metric('mean_reciprocal_rank'):.4f}")
print(f" Hits@1: {result.metric_results.get_metric('hits_at_1'):.4f}")
print(f" Hits@3: {result.metric_results.get_metric('hits_at_3'):.4f}")
print(f" Hits@10: {result.metric_results.get_metric('hits_at_10'):.4f}")
We define a consistent training configuration and systematically train the knowledge graph embedding models to enable fair comparison. We use the same dataset, negative sampling strategy, optimizer, and training loop while allowing each model to use its own adaptive bias and loss structure. We then evaluate and record standard metrics, such as MRR and Hits@K, to quantitatively evaluate the effectiveness of each embedding method in link prediction. Check it out FULL CODES here.
print("n" + "="*80)
print("SECTION 4: Model Comparison")
print("="*80 + "n")
metrics_to_compare = ['mean_reciprocal_rank', 'hits_at_1', 'hits_at_3', 'hits_at_10']
comparison_data = {metric: [] for metric in metrics_to_compare}
model_names = []
for model_name, result in results.items():
model_names.append(model_name)
for metric in metrics_to_compare:
comparison_data[metric].append(
result.metric_results.get_metric(metric)
)
comparison_df = pd.DataFrame(comparison_data, index=model_names)
print("Model Comparison:")
print(comparison_df.to_string())
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Model Performance Comparison', fontsize=16)
for idx, metric in enumerate(metrics_to_compare):
ax = axes[idx // 2, idx % 2]
comparison_df[metric].plot(kind='bar', ax=ax, color="steelblue")
ax.set_title(metric.replace('_', ' ').title())
ax.set_ylabel('Score')
ax.set_xlabel('Model')
ax.grid(axis="y", alpha=0.3)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.tight_layout()
plt.show()
We combine test metrics from all trained models to create a unified comparison table for direct performance analysis. We visualize key ranking metrics using bar charts, allowing us to quickly identify the strengths and weaknesses of all embedding methods. Check it out FULL CODES here.
print("n" + "="*80)
print("SECTION 5: Hyperparameter Optimization")
print("="*80 + "n")
hpo_result = hpo_pipeline(
dataset=dataset,
model="TransE",
n_trials=10,
training_loop='sLCWA',
training_kwargs={'num_epochs': 50},
device="cuda" if torch.cuda.is_available() else 'cpu',
)
print("nBest Configuration Found:")
print(f" Embedding Dim: {hpo_result.study.best_params.get('model.embedding_dim', 'N/A')}")
print(f" Learning Rate: {hpo_result.study.best_params.get('optimizer.lr', 'N/A')}")
print(f" Best MRR: {hpo_result.study.best_value:.4f}")
print("n" + "="*80)
print("SECTION 6: Link Prediction")
print("="*80 + "n")
best_model_name = comparison_df['mean_reciprocal_rank'].idxmax()
best_result = results[best_model_name]
model = best_result.model
print(f"Using {best_model_name} for predictions")
def predict_tails(model, dataset, head_label: str, relation_label: str, top_k: int = 5):
"""Predict most likely tail entities for a given head and relation."""
head_id = dataset.entity_to_id[head_label]
relation_id = dataset.relation_to_id[relation_label]
num_entities = dataset.num_entities
heads = torch.tensor([head_id] * num_entities).unsqueeze(1)
relations = torch.tensor([relation_id] * num_entities).unsqueeze(1)
tails = torch.arange(num_entities).unsqueeze(1)
batch = torch.cat([heads, relations, tails], dim=1)
with torch.no_grad():
scores = model.predict_hrt(batch)
top_scores, top_indices = torch.topk(scores.squeeze(), k=top_k)
predictions = []
for score, idx in zip(top_scores, top_indices):
tail_label = dataset.entity_id_to_label[idx.item()]
predictions.append((tail_label, score.item()))
return predictions
if dataset.training.num_entities > 10:
sample_head = list(dataset.entity_to_id.keys())[0]
sample_relation = list(dataset.relation_to_id.keys())[0]
print(f"nTop predictions for: {sample_head} --[{sample_relation}]--> ?")
predictions = predict_tails(
best_result.model,
dataset.training,
sample_head,
sample_relation,
top_k=5
)
for rank, (entity, score) in enumerate(predictions, 1):
print(f" {rank}. {entity} (score: {score:.4f})")
We use automated hyperparameter optimization to systematically search for robust TransE configurations that optimize rate performance without manual tuning. We then select the best performing model based on MRR and use it to perform active link prediction by hitting all possible associations of tails with a given head relative pair. Check it out FULL CODES here.
print("n" + "="*80)
print("SECTION 7: Model Interpretation")
print("="*80 + "n")
entity_embeddings = model.entity_representations[0]()
entity_embeddings_tensor = entity_embeddings.detach().cpu()
print(f"Entity embeddings shape: {entity_embeddings_tensor.shape}")
print(f"Embedding dtype: {entity_embeddings_tensor.dtype}")
if entity_embeddings_tensor.is_complex():
print("Detected complex embeddings - converting to real representation")
entity_embeddings_np = np.concatenate([
entity_embeddings_tensor.real.numpy(),
entity_embeddings_tensor.imag.numpy()
], axis=1)
print(f"Converted embeddings shape: {entity_embeddings_np.shape}")
else:
entity_embeddings_np = entity_embeddings_tensor.numpy()
from sklearn.metrics.pairwise import cosine_similarity
similarity_matrix = cosine_similarity(entity_embeddings_np)
def find_similar_entities(entity_label: str, top_k: int = 5):
"""Find most similar entities based on embedding similarity."""
entity_id = dataset.training.entity_to_id[entity_label]
similarities = similarity_matrix[entity_id]
similar_indices = np.argsort(similarities)[::-1][1:top_k+1]
similar_entities = []
for idx in similar_indices:
label = dataset.training.entity_id_to_label[idx]
similarity = similarities[idx]
similar_entities.append((label, similarity))
return similar_entities
if dataset.training.num_entities > 5:
example_entity = list(dataset.entity_to_id.keys())[0]
print(f"nEntities most similar to '{example_entity}':")
similar = find_similar_entities(example_entity, top_k=5)
for rank, (entity, sim) in enumerate(similar, 1):
print(f" {rank}. {entity} (similarity: {sim:.4f})")
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(entity_embeddings_np)
plt.figure(figsize=(12, 8))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], alpha=0.6)
num_labels = min(10, len(dataset.training.entity_id_to_label))
for i in range(num_labels):
label = dataset.training.entity_id_to_label[i]
plt.annotate(label, (embeddings_2d[i, 0], embeddings_2d[i, 1]),
fontsize=8, alpha=0.7)
plt.title('Entity Embeddings (2D PCA Projection)')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("n" + "="*80)
print("TUTORIAL SUMMARY")
print("="*80 + "n")
print("""
Key Takeaways:
1. PyKEEN provides easy-to-use pipelines for KG embeddings
2. Multiple models can be compared with minimal code
3. Hyperparameter optimization improves performance
4. Models can predict missing links in knowledge graphs
5. Embeddings capture semantic relationships
6. Always use filtered evaluation for fair comparison
7. Consider multiple metrics (MRR, Hits@K)
Next Steps:
- Try different models (ConvE, TuckER, etc.)
- Use larger datasets (FB15k-237, WN18RR)
- Implement custom loss functions
- Experiment with relation prediction
- Use your own knowledge graph data
For more information, visit:
""")
print("n✓ Tutorial Complete!")
We interpret the embeddedness of the learned entity by measuring semantic similarity and identifying closely related entities in the vector space. We perform high-dimensional embedding into two dimensions using PCA to explore structural patterns and clustering behavior within the information graph. We then synthesize key takeaways and outline clear next steps, reinforcing how embedded analysis links model performance to graph-level logical details.
In conclusion, we have developed a comprehensive, practical understanding of how to work with knowledge graph embeddings at an advanced level, from triples to translatable vector spaces. We demonstrated how to rigorously compare models, use hyperparameter optimization, perform link inference, and embedded analysis to reveal semantic structure within a graph. Also, we have shown how PyKEEN performs fast experiments while still allowing fine-grained control over training and testing, making it suitable for both research and real-world knowledge graph applications.
Check it out FULL CODES here. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.



