Ultimate Database

Moderators: Elijah, Igbo, timetraveller

EmptikBest

Book Maker
Forum Contributions
Points: 16 218,00 
Posts: 401
Joined: 05/07/2023, 13:58
Status: Offline (Active 1 Day, 12 Hours, 47 Minutes ago)
Medals: 1
Topics: 12
Reputation: 209
Has thanked: 29 times
Been thanked: 479 times

Ultimate Database

Post by EmptikBest »

Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc
FritzUser

Top contribute Forum
Forum Contributions
Points: 99,00 
Posts: 166
Joined: 03/12/2022, 1:38
Status: Offline (Active 5 Hours, 27 Minutes ago)
Medals: 1
Topics: 27
Reputation: 769
Has thanked: 294 times
Been thanked: 933 times

Ultimate Database

Post by FritzUser »

EmptikBest wrote: 09/09/2023, 6:07 Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc
Believe it or not, there wouldn't be many doubles. The CCRL, Chesscom, FGRL (which is what I think that is), and Lichess stuff are all separate entities on their own. So most of these would stay. The Caissabase is the Millionbase, the KingBase, TWIC and PGN Mentor, so I guess that would get you a lot. You would need to use PGN Split, which can be found at the Lichess Open Database website, to break the PGN into perhaps 10 GB pieces, so you can open it in Scid. Each one should import in a few minutes -- as opposed to ChessBase which will take about an hour per gig -- and you can search for "twins" (as they call them) in there, via the maintenance menu. There's a lot of stuff that you do in Scid with the context menu, and that isn't readily apparent. But if you find doubles it will automatically select all of them, at which point you should right-click and choose to negate the filter, which will only show what didn't turn up. Then right-click and choose to copy the filter games to PGN. Afterward you can put them back together.

One good thing about ChessBase, even though it's tough to get stuff in there, is that you can differentiate in two unique ways: you can filter out everything but the strong games, and you can add beauty scores and then filter based on that. That's what the Elegance DB is. It gets rid of about 95% of the games.

Ultimately I tend to split DBs up according to OTB, Online, Corr, and Engine. This is sometimes important because if a database is being used as material for an opening book, the book author may not want to mix those different types. But there are lots of reasons to make a DB, and this sounds like a really interesting project.
EmptikBest

Book Maker
Forum Contributions
Points: 16 218,00 
Posts: 401
Joined: 05/07/2023, 13:58
Status: Offline (Active 1 Day, 12 Hours, 47 Minutes ago)
Medals: 1
Topics: 12
Reputation: 209
Has thanked: 29 times
Been thanked: 479 times

Ultimate Database

Post by EmptikBest »

FritzUser wrote: 09/09/2023, 6:36
EmptikBest wrote: 09/09/2023, 6:07 Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc
Believe it or not, there wouldn't be many doubles. The CCRL, Chesscom, FGRL (which is what I think that is), and Lichess stuff are all separate entities on their own. So most of these would stay. The Caissabase is the Millionbase, the KingBase, TWIC and PGN Mentor, so I guess that would get you a lot. You would need to use PGN Split, which can be found at the Lichess Open Database website, to break the PGN into perhaps 10 GB pieces, so you can open it in Scid. Each one should import in a few minutes -- as opposed to ChessBase which will take about an hour per gig -- and you can search for "twins" (as they call them) in there, via the maintenance menu. There's a lot of stuff that you do in Scid with the context menu, and that isn't readily apparent. But if you find doubles it will automatically select all of them, at which point you should right-click and choose to negate the filter, which will only show what didn't turn up. Then right-click and choose to copy the filter games to PGN. Afterward you can put them back together.

One good thing about ChessBase, even though it's tough to get stuff in there, is that you can differentiate in two unique ways: you can filter out everything but the strong games, and you can add beauty scores and then filter based on that. That's what the Elegance DB is. It gets rid of about 95% of the games.

Ultimately I tend to split DBs up according to OTB, Online, Corr, and Engine. This is sometimes important because if a database is being used as material for an opening book, the book author may not want to mix those different types. But there are lots of reasons to make a DB, and this sounds like a really interesting project.
Thanks,

I will try to do that, merge a bunch of more DBs, clean it, and then upload it :)
EmptikBest

Book Maker
Forum Contributions
Points: 16 218,00 
Posts: 401
Joined: 05/07/2023, 13:58
Status: Offline (Active 1 Day, 12 Hours, 47 Minutes ago)
Medals: 1
Topics: 12
Reputation: 209
Has thanked: 29 times
Been thanked: 479 times

Ultimate Database

Post by EmptikBest »

Hi, if anyone would like their DB added/would like to contribute another DB (except FICS and ICCF which I am gonna add in the next update), please private message me or post here..

Please make it so there are only computer/engine games, or games from Elite players (Like Magnus Carlsen, Ian, Ding Liren)

Thanks thankyou

EDIT: If possible please already remove the doubles, if that is not possible, please mention it and I will do it thankyou
EmptikBest

Book Maker
Forum Contributions
Points: 16 218,00 
Posts: 401
Joined: 05/07/2023, 13:58
Status: Offline (Active 1 Day, 12 Hours, 47 Minutes ago)
Medals: 1
Topics: 12
Reputation: 209
Has thanked: 29 times
Been thanked: 479 times

Ultimate Database

Post by EmptikBest »

ANNOUNCEMENT:

FICS 2000-2012 will be added in next update, thanks to deeds! These are 116GiB unfiltered, no doubles removed, after filtering will probably be less..
ICCF 2015-2022 will be added in next update, 323MB unfiltered..
Maybe CEDR 3+3 games will be added

ALL comments will be removed to save space, sorry :(

If I have time, maybe I'll make a seperate Chess960 archive..

P.S: If anyone has Chess960 games/DBs to share, please send them, I will probably make a seperate archive for 960
EmptikBest

Book Maker
Forum Contributions
Points: 16 218,00 
Posts: 401
Joined: 05/07/2023, 13:58
Status: Offline (Active 1 Day, 12 Hours, 47 Minutes ago)
Medals: 1
Topics: 12
Reputation: 209
Has thanked: 29 times
Been thanked: 479 times

Ultimate Database

Post by EmptikBest »

ANNOUNCEMENT:

Sorry for the delay, have been busy with other projects over the past few weeks..

Next update will also add
  1. Chess.com Elite, thanks to Sarona!
  • Millionbase 2017
Thanks to everyone for their contributions!
Post Reply

Return to “Engine Databases”