Legitimate, Free, SDN matriculant data spreadsheet

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

DrTroll

Full Member
10+ Year Member
Joined
Jul 16, 2010
Messages
253
Reaction score
5
Hey guys, this originally was aSagacious's idea, I'm just starting a new thread to get some more attention and help as suggested by the cat.

The original LizzyM spreadsheet is banned because of objections from AAMC, so aSagacious suggested we started a "legitimate, free, SDN matriculant data spreadsheet." with data published by schools.

He/she got the ball rolling by setting the format and the first 20 schools, and I got the data for Texas schools that I can find.

If you would like to help, please insert data into the spreadsheet if you can, or you can state your willingness to insert the data after it's posted.

Please also link to the source of data to legitimatize it.

Thanks!


Edit: Thanks to everybody that's been helping out, we got so many schools done within 24 hours, let's keep it up!

And here is the Link to Google doc spreadsheet version:
http://is.gd/sdn_med_matriculant_data (thanks to paul411 for setting it up)

If anyone wants to use the "Chances" feature (which predicts how well you'll do at a school based on your "LizzyM Score") download your own copy of the spreadsheet and enter your stats.

If you would like to contribute, please message paul411 for access permission. (public edit feature disabled due to trolls keeping messing up the data)

Members don't see this ad.
 
Last edited:
Here's an idea:

why don't we just post directly from the latest MSAR (will take a lot of time and work) BUT instead of copying it number for number why don't we "fudge" the numbers by a common factor (ie - multiply them all by 1.1 or something). That way it isn't copyright. The individual person who downloads it can then apply a function to the entire excel sheet to reverse the 1.1 multiplying factor. Basically, what we post IS NOT the MSAR. People do stuff similar to "copyrighted" Youtube videos (speed it up, slow it down) in order for them not to get taken down and then the user can download them and then reverse the effect to get the original.
 
b/c you can do the same thing with data from the websites?
 
Members don't see this ad :)
Here's an idea:

why don't we just post directly from the latest MSAR (will take a lot of time and work) BUT instead of copying it number for number why don't we "fudge" the numbers by a common factor (ie - multiply them all by 1.1 or something). That way it isn't copyright. The individual person who downloads it can then apply a function to the entire excel sheet to reverse the 1.1 multiplying factor. Basically, what we post IS NOT the MSAR. People do stuff similar to "copyrighted" Youtube videos (speed it up, slow it down) in order for them not to get taken down and then the user can download them and then reverse the effect to get the original.

A- That is a bad idea
B- Again, the MSAR posts ACCEPTED applicant data, what people have been recently asking for is MATRICULANT data

Additionally, you might ask each participant to link to the source of their data to legitimate it.
Good idea. I've updated my entries with sources.
 

Attachments

  • Med School Matriculants Data.xls
    83 KB · Views: 1,943
Here's an idea:

why don't we just post directly from the latest MSAR (will take a lot of time and work) BUT instead of copying it number for number why don't we "fudge" the numbers by a common factor (ie - multiply them all by 1.1 or something). That way it isn't copyright. The individual person who downloads it can then apply a function to the entire excel sheet to reverse the 1.1 multiplying factor. Basically, what we post IS NOT the MSAR. People do stuff similar to "copyrighted" Youtube videos (speed it up, slow it down) in order for them not to get taken down and then the user can download them and then reverse the effect to get the original.

Eh, let's just try to make this as legit as possible so there's no more song and dance about this spreadsheet.

I'm doing the last 5-7.

Some things I'm already noticing: Yeah, lot's of schools post their matriculant data but sometimes students like myself care more about accepted student data. However, not all schools post matriculant, some post accepted. Would a spreadsheet where both of these #'s are being used be appropriate?
 
Even though it would be double the amount of work to complete, I agree with Narmerguy that some people may be looking for different information and slots for both Accepted and Matriculant data would be useful.

flatearth, that's essentially intent to break copyright. It's pretty obvious.
 
My biggest thanks to everyone who contributes data. People like you are the reason SDN is so useful as a resource. Oh, and thanks for adding the DO schools as well. Much appreciated!
 
  • Like
Reactions: 1 user
Even though it would be double the amount of work to complete, I agree with Narmerguy that some people may be looking for different information and slots for both Accepted and Matriculant data would be useful.

flatearth, that's essentially intent to break copyright. It's pretty obvious.

Well sadly I flatout haven't seen a school that provides both yet. I also agree that we probably should provide both, but it's pretty darn difficult to find on their website. I've inserted comments into the MCAT total column to indiciate whether this was for matriculating or accepted students.

There's going to have to be some serious quality control at the end and sadly some reworking as with multiple sources of info we may be getting different flavors of information.
 

Attachments

  • Med School Matriculants Data.xls
    89 KB · Views: 754
Seriously, just pick up a copy of the MSAR. Trying to get a spreadsheet like this will introduce way too many variables to be worth it. Some schools post matriculants/accepted students numbers, some give ranges, some only give partial information ie: no in state vs. out of state numbers, some may have data that's a few years old. It simply makes more sense to go to the library or advisors office and spend some time flipping through pages to get an accurate idea.
 
Seriously, just pick up a copy of the MSAR. Trying to get a spreadsheet like this will introduce way too many variables to be worth it. Some schools post matriculants/accepted students numbers, some give ranges, some only give partial information ie: no in state vs. out of state numbers, some may have data that's a few years old. It simply makes more sense to go to the library or advisors office and spend some time flipping through pages to get an accurate idea.
+1. just plop down on the msar + usnes you cheapskates
 
Ok, here's another update (working on the bottom, since they require scrolling to keep track of headers).
 

Attachments

  • Med School Matriculants Data.xls
    93.5 KB · Views: 604
I must be missing something here.

In the United States, to copyright something, it has to be original or creative in some way. You can't copyright the data in a telephone directory, for example:
http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

Why would you be able to copyright statistics about medical school applicants?

Unless you sign an NDA when you get the AAMC's handbook, how could they have any grounds to object to someone posting numerical data from it?
 
First of all, why would anyone care about matriculant data? Accepted data is far more valuable and gives people an idea of what they need to be accepted. If you pick what school you go to based on average matriculant MCAT/GPA, you're ******ed.

I must be missing something here.

In the United States, to copyright something, it has to be original or creative in some way. You can't copyright the data in a telephone directory, for example:
http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

Why would you be able to copyright statistics about medical school applicants?

Unless you sign an NDA when you get the AAMC's handbook, how could they have any grounds to object to someone posting numerical data from it?

This is what I was thinking. It's BS the AAMC claims that data is copyrighted. Anyone can claim they have a copyright, but you don't have to listen to them just because they say that. Unfortunately SDN doesn't want to go against what the AAMC wants.
 
Members don't see this ad :)
Someone contact the pre-law forum.

I must be missing something here.

In the United States, to copyright something, it has to be original or creative in some way. You can't copyright the data in a telephone directory, for example:
http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

Why would you be able to copyright statistics about medical school applicants?

Unless you sign an NDA when you get the AAMC's handbook, how could they have any grounds to object to someone posting numerical data from it?

The ruling of the Court was written by Justice O'Connor. It examined the purpose of copyright and explained the standard of copyrightability as based on originality.

It is a long-standing principle of United States copyright law that "information" is not copyrightable, O'Connor notes, but "collections" of information can be. Rural claimed a collection copyright in its directory. The court clarified that the intent of copyright law was not, as claimed by Rural and some lower courts, to reward the efforts of persons collecting information, but rather "to promote the Progress of Science and useful Arts" (U.S. Const. 1.8.8), that is, to encourage creative expression.

Since facts are purely copied from the world around us, O'Connor concludes, "the sine qua non of copyright is originality". However, the standard for creativity is extremely low. It need not be novel, rather it only needs to possess a "spark" or "minimal degree" of creativity to be protected by copyright.
In regard to collections of facts, O'Connor states that copyright can only apply to the creative aspects of collection: the creative choice of what data to include or exclude, the order and style in which the information is presented, etc., but not on the information itself. If Feist were to take the directory and rearrange them it would destroy the copyright owned in the data.

The court ruled that Rural's directory was nothing more than an alphabetic list of all subscribers to its service, which it was required to compile under law, and that no creative expression was involved. The fact that Rural spent considerable time and money collecting the data was irrelevant to copyright law, and Rural's copyright claim was dismissed.

The ruling has major implications for any project that serves as a collection of knowledge. Information (that is, facts, discoveries, etc.), from any source, is fair game, but cannot contain any of the "expressive" content added by the source author. That includes not only the author's own comments, but also his choice of which facts to cover, his choice of which links to make among the bits of information, his order of presentation (unless it is something obvious like an alphabetical list), any evaluations he may have made about the quality of various pieces of information, or anything else that might be considered "original creative work" of the author rather than mere facts.

For example, a recipe is a process, and not copyrightable, but the words used to describe it are; see idea-expression divide and Publications International v Meredith Corp. (1996).[2] Therefore, you can rewrite a recipe in your own words and publish it without infringing copyrights. But, if you rewrote every recipe from a particular cookbook, you might still be found to have infringed the author's copyright in the choice of recipes and their "coordination" and "presentation", even if you used different words; however, the West decisions below suggest that this is unlikely unless there is some significant creativity carried over from the original presentation.


Another case covering this area is Assessment Technologies v. Wiredata (2003),[7] in which the Seventh Circuit Court of Appeals ruled that a copyright holder in a compilation of public domain data cannot use that copyright to prevent others from using the underlying public domain data, but may only restrict the specific format of the compilation, if that format is itself sufficiently creative. Assessment Technologies also held that it is a fair use of a copyrighted work to reverse engineer that work in order to gain access to uncopyrightable facts. Assessment Technologies also created new law, stating that it is a copyright misuse and an abuse of process if one attempts to use a contract or license agreement based on one's copyright to protect uncopyrightable facts.

He has a point...
 
Last edited:
Hmm, these clearly seem to suggest that transcription of a handful of data columns from the MSAR or US News is completely legal, as they are not copyright-able collections of data.
 
If you are worried about copyright just spread the way the other excel spreadsheet was. Post here, someone will repost somewhere else, here it will get locked by a mod, but by that time you have the spreadsheet being posted elsewhere, and before you know it the spreadsheet is "Google-able" and forever online.

Just play dumb, pretend not to be aware that you are doing a bad thing, and before you know it the spreadsheet is accessible and no one gets in trouble.
 
Setting aside the discussion of whether the claim to copyright is valid, I would suspect that if you (personally) want to challenge it in court, you're more than welcome to, if you provide your own resources. I'd be hard pressed to think of a reason why SDN would want pitch in its own resources to fight that battle.
 
Indeed, I would think that "all medical schools in the US and Canada" is a very obvious selection of schools to cover for the intended audience, and that applicant GPA/MCAT would be equally obvious pieces of data to include about these schools. The case for originality could be bolstered, perhaps, by organizing the data in some reasonable way different from the AAMC's.
 
I don't know if this is legit... but I made a spreadsheet out of the Eduers data last summer -- well after first using the LizzyM spreadsheet (which helped me narrow down schools that where in my range). The data is from 2007... maybe 2009... but it just helped me get a general idea.

I was interested not only in average MCAT and GPA, but also percent of applicants interviewed (which eduers tells you - breakdown of women, IS, OSS, minorities - it's pretty cool). Reach schools tended to interview more students compared to "my good chance" schools... so i made a mishmash list based on my chances of being interviewed. I know I am a good interviewer... so even with lower numbers, if given the chance - I knew I could rock it.

After making my list, I verified it over with the MSAR. My schools pre-med department had a copy which they let me borrow (without leaving), so I am sure most schools can let you do the same.

Just my 2cents.
 
Indeed, I would think that "all medical schools in the US and Canada" is a very obvious selection of schools to cover for the intended audience, and that applicant GPA/MCAT would be equally obvious pieces of data to include about these schools. The case for originality could be bolstered, perhaps, by organizing the data in some reasonable way different from the AAMC's.

If it were in Excel, organization would go out the window because you could readily sort by any data column (state, alphabet, MCAT, GPA... none of which are uncommon).
 
Setting aside the discussion of whether the claim to copyright is valid, I would suspect that if you (personally) want to challenge it in court, you're more than welcome to, if you provide your own resources. I'd be hard pressed to think of a reason why SDN would want pitch in its own resources to fight that battle.

Well, I tend to agree, but on the face of it, the copyright claim seems to ludicrous that you should be able to get it dismissed with prejudice if they actually sued. I very much doubt they'd have to fight anything in court. I mean, it fits very neatly into the Feist v Rural template.

AAMC must know that it's strictly bluffing; a quick consultation with an IP lawyer should confirm this theory. If someone sent me a cease-and-desist letter over it, I'd try to get a declaratory judgment against them in my own jurisdiction.
 
If you are worried about copyright just spread the way the other excel spreadsheet was. Post here, someone will repost somewhere else, here it will get locked by a mod, but by that time you have the spreadsheet being posted elsewhere, and before you know it the spreadsheet is "Google-able" and forever online.

Just play dumb, pretend not to be aware that you are doing a bad thing, and before you know it the spreadsheet is accessible and no one gets in trouble.


It is google-able atm no?
 
Excellent idea guys! But I already have the msar....
 
I know pre-meds are a bit neurotic (I'm one) but grow up a bit. The content of the MSAR is copyrighted, and yes they would fight and win to protect it.

They own the underlying data, they spend the $$ to collect, process and manipulate it. Unless you collect the data from the schools and store/process it yourself you are violating their copyright and will lose.

Ultimately, spend the $ and by a copy of the MSAR, or do your own research and quit pretending to be lawyers.
 
The case for originality could be bolstered, perhaps, by organizing the data in some reasonable way different from the AAMC's.
Like maybe having a matriculant Lizzy M number for the first column [MCAT score + (cGPA X 10)] and perhaps including an OOS matriculant rate? Including the DO schools is a great idea, too.
 
The content of the MSAR is copyrighted, and yes they would fight and win to protect it.

They own the underlying data, they spend the $$ to collect, process and manipulate it. Unless you collect the data from the schools and store/process it yourself you are violating their copyright and will lose.

That might be fair and just, but it's not the law. The supreme court explicitly considered and rejected your rationale:

http://en.wikipedia.org/wiki/Sweat_of_the_brow

Courts in some other parts of the world do agree with your reasoning, as had a few in the US before Feist v Rural. Yours is not a rationally untenable position by any means, but there are other equally viable positions to take on the matter, and it is one of those, not yours, that has been chosen by the supreme court.
 
First of all, why would anyone care about matriculant data? Accepted data is far more valuable and gives people an idea of what they need to be accepted. If you pick what school you go to based on average matriculant MCAT/GPA, you're ******ed.



This is what I was thinking. It's BS the AAMC claims that data is copyrighted. Anyone can claim they have a copyright, but you don't have to listen to them just because they say that. Unfortunately SDN doesn't want to go against what the AAMC wants.

Gives the applicant an idea of what kind of student matriculates there; i.e., a school can overaccept people with high stats like crazy and skew their numbers. Matriculated data lets us know how the school ends up after all those students with high stats withdraw. Plus, with matriculated data you aren't over-representing people with high stats, since each person gets one data point. With accepted data, students with high numbers are accepted all over the place and their numbers are taken into account numerous times.
 
Well, I tend to agree, but on the face of it, the copyright claim seems to ludicrous that you should be able to get it dismissed with prejudice if they actually sued. I very much doubt they'd have to fight anything in court. I mean, it fits very neatly into the Feist v Rural template.

AAMC must know that it's strictly bluffing; a quick consultation with an IP lawyer should confirm this theory. If someone sent me a cease-and-desist letter over it, I'd try to get a declaratory judgment against them in my own jurisdiction.

Sure, that would be a fine plan if it was important enough to you to even do that. My personal quick cost-benefit analysis (no lawyers needed), were I in SDN's shoes, would say it's preferable to comply with any C&D requests rather than challenge them.

I know pre-meds are a bit neurotic (I'm one) but grow up a bit. The content of the MSAR is copyrighted, and yes they would fight and win to protect it.

They own the underlying data, they spend the $$ to collect, process and manipulate it. Unless you collect the data from the schools and store/process it yourself you are violating their copyright and will lose.

Ultimately, spend the $ and by a copy of the MSAR, or do your own research and quit pretending to be lawyers.

Okay.
 
Excellent idea guys! But I already have the msar....
you're a pre-frosh with an msar? :laugh::thumbup:

but srsly, i still don't understand this obsession; the relative cost for these data is extremely small. the absolute cost is also quite low.
 
you're a pre-frosh with an msar? :laugh::thumbup:

but srsly, i still don't understand this obsession; the relative cost for these data is extremely small. the absolute cost is also quite low.

Principles, man.
 
Seriously here people. You can spend tons of time compiling a set of data that is most likely flawed due to the differences in reporting from one school to the next, you can argue over a bunch of laws that nobody here actually knows (stop pretending to be lawyers already! wiki isn't the equivalent of a law degree), OR you could just look it up in the MSAR quick and easy at the library or pre med office.
 
I am honestly not surprised. You aren't even close to applying, why did you get one?

It was a gift for the newest one, but I also have two previous editions to project my anticipated school list. :)
 
We need a Google Docs version of this for collaborative editing.
 
We need a Google Docs version of this for collaborative editing.

Was going to suggest this exact thing.

Whoever has the latest version: upload it to Google docs, set it to be shared with anyone with the link, and post the link here. That'll make it much easier for people to contribute and not repeat work.
 
I feel bad for people who didn't get the old school selector excel before it got taken down
 
Google Docs spreadsheet: http://is.gd/sdn_med_matriculant_data

Publicly editable. Add any contributions here so people don't repeat work.

Tip for finding official school data:
Google search something like this:
Code:
site:[I]med-school.edu[/I] mcat gpa

Where med-school.edu is the official domain of the medical school whose data you are looking for. This will search only the school's website for pages/documents containing the keywords "mcat" and "gpa"
 
Last edited:
Why not just copy all the msar data you want into a spreadsheet and fudge each statistic by +/- 0.01 or 0.02. Here's the catch...each person that contributes a portion of the data decides how to "encrypt" the data, but no one ever tells anyone else how they adjusted the numbers. That way it is a completely unique set of data with an encryption code that no single person knows. All the data would have a very tiny degree of error, but still be entirely useful and pretty accurate.

:)
 
you can argue over a bunch of laws that nobody here actually knows (stop pretending to be lawyers already! wiki isn't the equivalent of a law degree),

I continue to be totally unimpressed by that argument. This is not a complex case.

Tell me, if you see this in your neighbors' kid, are you going to refrain from bringing it to their attention because you're not a physician yet? Or worse, because you're not an ophthalmologist? Or because you're not the world's foremost expert on retinoblastoma?

Of course, if I got the C&D, I'd consult a lawyer, but that should be obvious. Likewise with the suspected retinoblastoma, I'd say something along the lines of "Gee, that could be a retinoblastoma, I think you should see a doctor right away," not "I think your child has a retinoblastoma. I'll take that sucker out with a dinner fork for $50, what do you say, Bob?"

I would love to hear a legal argument for the data being protected that isn't specifically refuted by current case law.

Anyway, some of us have graduated and are separated from our pre-med office by several thousand miles.
 
FWIW I've already bought the MSAR guidebook :laugh: This isn't out of a personal need for the information.
 
Why not just copy all the msar data you want into a spreadsheet and fudge each statistic by +/- 0.01 or 0.02. Here's the catch...each person that contributes a portion of the data decides how to "encrypt" the data, but no one ever tells anyone else how they adjusted the numbers. That way it is a completely unique set of data with an encryption code that no single person knows. All the data would have a very tiny degree of error, but still be entirely useful and pretty accurate.

:)

_
 
Last edited:
@Gnomes: Feel free to do that if you want, but I'm not spending a couple of days of my life finding and hiring a lawyer, traveling to district court, and fighting the motions. I have better things to do with my life, and I imagine that's the attitude of most people here.
 
Why not just copy all the msar data you want into a spreadsheet and fudge each statistic by +/- 0.01 or 0.02. Here's the catch...each person that contributes a portion of the data decides how to "encrypt" the data, but no one ever tells anyone else how they adjusted the numbers. That way it is a completely unique set of data with an encryption code that no single person knows. All the data would have a very tiny degree of error, but still be entirely useful and pretty accurate.
This is a reasonable idea


right, because when the AAMC attorneys go fact-finding, there's no way they'll happen on your post in this completely private thread in the not most visible premedical forum on the internet. stop thinking like kids and understand the implications of your actions here. if you are going to plug in the data i sure as hell wouldn't go posting the MSAR, regardless of how you'll oh-so-cleverly hide the data; the AAMC has demonstrated before they will protect their publications. use publicly available stuff, and nothing more. i doubt any of you are using ip obfuscating tech and even dynamic ip can be traced back to you. and on the off chance this does blow up, sdn is not going to protect you.
 
Why not just copy all the msar data you want into a spreadsheet and fudge each statistic by +/- 0.01 or 0.02. Here's the catch...each person that contributes a portion of the data decides how to "encrypt" the data, but no one ever tells anyone else how they adjusted the numbers. That way it is a completely unique set of data with an encryption code that no single person knows. All the data would have a very tiny degree of error, but still be entirely useful and pretty accurate.

:)

Let's just add publicly available (and verifiable) data posted by the schools and include the source URL.
 
right, because when the AAMC attorneys go fact-finding, there's no way they'll happen on your post in this completely private thread in the not most visible premedical forum on the internet. stop thinking like kids and understand the implications of your actions here. if you are going to plug in the data i sure as hell wouldn't go posting the MSAR, regardless of how you'll oh-so-cleverly hide the data; the AAMC has demonstrated before they will protect their publications. use publicly available stuff, and nothing more. i doubt any of you are using ip obfuscating tech and even dynamic ip can be traced back to you. and on the off chance this does blow up, sdn is not going to protect you.

While I somewhat agree that we shouldn't use that approach, let's not pretend that users will have their IP addresses looked up and used to prosecute them. This has close to no precedent in any instance not involving mass p2p sharing of songs, games, and videos.
 
Top