Unique ID for players

Issue No. 127

Type

Improvement

Status

Open

Reported By

web

Component

API

Votes

74

Created

18/Mar/13 11:37 AM EDT

Tracking

Sign in to track this issue

Description

The data needs a unique ID to tie players to box scores. Without one there's the risk of two players with the same name's data getting mixed up. What about {first 7 characters of last name}-{first 3 characters of first name}-{2 digit year of birth}{2 digit month of birth}{2 digit day of birth} example -- David Wright, NYM would be wright-dav-821221 It's overkill, sure but I'd rather be overprepared than underprepared!

Comments

1. Erik Berg 17/Apr/2013 at 8:29 PM EDT

Yes, this needs to happen. Your solution is not overkill. In fact, it would need an extra character or another field to disambiguate the rare occurrence of twins with similar first names. Consider: Marcus Morris, morris-mar-890902 Markieff Morris, morris-mar-890902 Does it make sense to try to use parts of the name, birthdate, birthplace, etc. to indicate the player in question or should it simply be an arbitrary number?

2. Mark Hughes 18/Apr/2013 at 11:22 AM EDT

Ideal would be an id shared with another site, e.g. baseball prospectus, espn or mlb.com. This would make it easier to mash data from different places together. ESPN's public api has ESPN IDs for (most) active players. Many of the rest of their IDs are scrapeable. I don't know whether mlb.com has a list. Or maybe whatever ids are used by your stats source. My second choice would be a unique number. Some players names are not standard across information sources. A recent example is Juan Gutierrez (baseball prospectus) and J.C. Gutierrez (espn). So, the advantages of an identifier based on names are sometimes absent. This seems simplest for you and doesn't make you dependent on another site. Whatever solution you implement, it would be great to have a cross-reference available. Baseball Prospectus publishes a limited one as a spreadsheet.

3. Erik Berg 24/Apr/2013 at 11:11 AM EDT

I understand the problem if you are trying to combine data between sites and services with different player IDs. But, short of getting a feed from one of those other sites, it would be unrealistic to try to synchronize with them. Plus, even with a feed, xmlstats aims to cover multiple sports and leagues, not just MLB and NBA. It needs an independent mechanism to generate IDs. However, I would consider adding functionality if the community at large wanted to add and maintain player IDs from other services.

4. mat gargano 24/Apr/2013 at 3:25 PM EDT

I completely agree with you Erik. It'd be awesome to get a bb-databank shared key, but it's not practical. The Morris brothers bring up an interesting issue... maybe an arbitrary number would be easiest/best. -MG

5. Mark Hughes 26/Apr/2013 at 5:43 PM EDT

I understand completely the difficulty of trying to maintain numbers used elsewhere. I try and routinely find errors. A unique player_id makes perfect sense.

6. Mark Hughes 28/Apr/2013 at 9:43 AM EDT

I should add, though, that it is hard for me, when I am dealing with a large player universe, to tell some players from others. This is the main reason I settled on using the ESPN ids. I trust ESPN to assign unique numbers.

7. Erik Berg 30/Apr/2013 at 6:49 PM EDT

Looks to me like their player IDs start at 1 and increment by 1 independent to each sport. Just a simple auto increment key in their database. That's what xmlstats uses now, as well, but obviously the keys are not synchronized. The advantage of this strategy is that it is simple to setup and simple to enforce uniqueness. The downside is that it is yet another ID that has no meaning beyond this application. With a strategy devised to generate player IDs based on DOB, birthplace, name, etc., it would be possible to match players in xmlstats from other services (as long as those attributes used to make the ID were already present). The advantage is that the ID actually means something and could potentially be used (and generated) by other services. The downside is that it is more complicated to ensure uniqueness, requires ancillary program/script to encode and decode the ID.

8. Bob Viscovich 19/May/2013 at 8:28 PM EDT

My vote would be to keep it simple, and use a unique id, without the overhead required to map some meaning into the id (which isn't really good practice anyway, especially if the true intent is to use the id as a unique identifier). It's very difficult to try to make the id BOTH unique AND semantically meaningful. In the end, you end up compromising on both. Anyway, just my 2 cents. Good luck with the decision!

9. Davide Tarasconi 20/May/2013 at 6:36 AM EDT

I agree with Bob: IDs should be as dumb as possible to keep the design as simple and as maintainable as possibile.

10. Oren Levitzky 26/Sep/2013 at 12:14 PM EDT

any news about this issue? The season is about to start thanks!

11. Oren Levitzky 29/Sep/2013 at 4:03 PM EDT

Hi again, i saw that you already have a player id in your database. Are you gonna pass it as well via the roster method?

12. Oren Levitzky 11/Oct/2013 at 3:23 AM EDT

any news here?

13. Erik Berg 12/Oct/2013 at 9:26 AM EDT

Sorry this is not resolved yet. The id you see on player pages (e.g., /nba/player/6420) is very much "use at your own risk" and is not supported or guaranteed to be static. It will go away entirely when this issue is resolved. Individual player summary and aggregate stats will be featured prominently in the API, so resolving this is near the top of the TODO list. Thanks for being patient.

14. Sergio Gijon 12/Oct/2014 at 3:44 PM EDT

There are only 2 players with the same name. Hopefully, they play in different teams. http://i.imgur.com/YBuKUyu.png

15. Alex Simon 16/Oct/2015 at 3:48 AM EDT

What about this issue? It was opened in 2013 and still nothing has been done. It would be a really useful improvement.

16. Xavier Garnier 11/May/2016 at 4:40 PM EDT

Hi Erik, I wanted to implement the NBA box score but then I saw your comment for Basketball Stats Object: "* - Only present for player statistics. In the future, the asterisk marked fields will probably move to a player object and embed the basketball_stats object instead." So I remembered this issue. Do you know when you'll have the Player Object done ? I know it's not an easy one but it would improve your database architechture and it would change the way we use your API. Best regards, Xavier

17. Xavier Garnier 21/Oct/2016 at 11:28 AM EDT

Hello Erik, Did you find a solution for the Unique ID issue ? BR, Xavier