Retrieval-Augmented Generation for LLM Customization

We survey, evaluate, and open source SOTA Retrieval-Augmented Generation (RAG) algorithms for LLM customization and reasoning. We offer a comprehensive evaluation on each component of GraphRAG, including graph construction (time), knowledge retrieval (time), answer generation (accuracy), and rationale generation (reasoning). We aim to provide unprecedented insights into how graph-structured knowledge enhances LLMs' reasoning capabilities compared to traditional RAG approaches.

Read our survey for more introduction to RAG for LLM customization & reasoning.

Graph RAG Model GPT-4o-mini base

Model	Average	TF	MC	MS	FB	OE	Token cost	Time cost	Organization	Retrieval time	Date
GPT-4o-mini	70.68	75.95	81.11	76.68	74.29	52.23	-	-	-	-
TF-IDF	71.71	84.17	77.88	72.52	75.71	50.18	-	-	-	-
BM-25	71.66	84.49	78.80	71.17	74.28	50.00	-	-	-	-
RAPTOR	73.58	82.28	80.65	77.48	76.67	54.83	10,142,221	347m27s	-	0.02s	2025-01-31
HippoRAG	72.64	81.65	80.18	74.32	70.48	56.13	33,006,198	162m26s	89.58%	2.44s	2024-12-19
GraphRAG(Microsoft)	72.50	80.70	81.57	77.48	75.24	52.42	79,929,698	216m17s	72.51%	44.87s	2025-02-19
GFM-RAG	72.10	82.59	80.65	72.07	72.38	52.79	32,766,094	95m24s	89.97%	1.96s	2025-02-03
KGP	71.86	82.28	79.26	74.77	74.29	51.49	15,271,633	292m2s	46.03%	89.38s	2023-12-25
ToG	71.71	79.75	78.80	78.38	70.48	54.28	33,008,230	105m15s	89.95%	70.53s	2024-03-24
LightRAG	71.22	82.59	78.80	73.42	65.24	53.16	83,909,073	240m6s	69.71%	13.95s	2025-04-28
G-Retriever	69.84	78.80	77.42	71.62	70.95	52.04	32,948,161	103m55s	89.95%	23.77s	2024-05-27
DALK	69.30	77.22	78.34	71.62	70.00	51.49	33,007,324	84m41s	89.49%	26.80s	2024-10-17
No entries match the selected filters. Try adjusting your filters.

Model	Average	TF	MC	MS	FB	OE	Token cost	Time cost	Organization	Retrieval time	Date
GPT-4o-mini	39.78	53.40	50.92	39.19	53.33	9.76	-	-	-	-
TF-IDF	42.38	61.23	49.19	43.02	52.61	10.50	-	-	-	-
BM-25	44.15	62.18	53.11	42.79	56.42	11.52	-	-	-	-
RAPTOR	45.53	62.90	52.07	49.10	57.86	13.57	10,142,221	347m27s	-	0.02s	2025-01-31
HippoRAG	44.55	63.61	52.30	47.52	50.48	12.36	33,006,198	162m26s	89.58%	2.44s	2024-12-19
GraphRAG(Microsoft)	43.30	60.13	52.42	45.72	55.24	10.50	79,929,698	216m17s	72.51%	44.87s	2025-02-19
GFM-RAG	44.30	63.69	52.07	45.50	54.76	10.69	32,766,094	95m24s	89.97%	1.96s	2025-02-03
KGP	42.22	60.68	52.07	44.37	49.29	8.92	15,271,633	292m2s	46.03%	89.38s	2023-12-25
ToG	44.01	62.26	51.73	45.72	53.10	12.08	33,008,230	105m15s	89.95%	70.53s	2024-03-24
LightRAG	43.81	63.45	52.30	49.10	47.86	10.13	83,909,073	240m6s	69.71%	13.95s	2025-04-28
G-Retriever	43.66	60.21	53.46	48.20	55.00	10.04	32,948,161	103m55s	89.95%	23.77s	2024-05-27
DALK	42.12	58.23	50.35	46.40	55.24	9.67	33,007,324	84m41s	89.49%	26.80s	2024-10-17
No entries match the selected filters. Try adjusting your filters.

Model	Average	Reasoning	TF	MC	MS	FB	OE	Site
GPT-4o-mini	70.68	39.78	53.40	50.92	39.19	53.33	9.76
No entries match the selected filters. Try adjusting your filters.

GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation.
GraphRAG-Bench contains 1,018 college-level question spans 16 disciplines, e.g., computer vision, computer networks, human-computer interaction, AI ethics, etc., featuring the ability of conceptual understanding, e.g., "Given [theorem] A and B, prove [conclusion] C", complex algorithmic programming, e.g., coding with interlinked function calls, and mathematical computation, e.g., "Given [Input], [Conv1], [MaxPool], [FC], calculate the output volume dimensions."
GraphRAG-Bench contains five types of diverse questions to thoroughly evaluate different aspects of reasoning, including true-or-false (TF), multiple-choice (MC), multi-select (MS), fill-in-blank (FB), and open-ended (OE).

Analyze Results in Detail

If you find this website helpful, welcome to cite our papers:

@article{zhang2025survey,
  title={A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models},
  author={Zhang, Qinggang and Chen, Shengyuan and Bei, Yuanchen and Yuan, Zheng and Zhou, Huachi and Hong, Zijin and Dong, Junnan and Chen, Hao and Chang, Yi and Huang, Xiao},
  journal={arXiv preprint arXiv:2501.13958},
  year={2025}
}

@article{xiao2025graphrag,
  title={GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation},
  author={Xiao, Yilin and Dong, Junnan and Zhou, Chuang and Dong, Su and Zhang, Qianwen and Yin, Di and Sun, Xing and Huang, Xiao},
  journal={arXiv preprint arXiv:2506.02404},
  year={2025}
}