ConardLi/easy-dataset - OSS Alternative - Discover Top Open Source Alternatives to Popular Software
LLM Dataset Generation and Evaluation Platform
14.1k 2026-04-30

ConardLi/easy-dataset

Easy Dataset is a powerful application for creating high-quality datasets for LLM fine-tuning, RAG, and model evaluation, featuring intelligent document processing and a comprehensive evaluation system.

Core Features

Intelligent Document Processing (PDF, Markdown, DOCX, TXT, EPUB)
Multiple Intelligent Text Splitting Algorithms
Automated Question and Answer Generation with AI Optimization
Support for Various Dataset Types (QA, Dialogue, Image QA)
Comprehensive Model Evaluation System with Automated and Human Blind Tests

Detailed Introduction

Easy Dataset is an application specifically designed for building large language model (LLM) datasets. It offers an intuitive interface, powerful document parsing, intelligent segmentation, data cleaning, and augmentation capabilities. The platform converts domain-specific documents into high-quality structured datasets for model fine-tuning, retrieval-augmented generation (RAG), and performance evaluation. Recent updates include robust evaluation capabilities, allowing users to generate evaluation datasets and conduct multi-dimensional assessments, including a human blind test system.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.