Attribution according to CC by-sa
From Tech
Contents |
[edit] What's the Problem
The problem is proper attribution of authors and source works when publishing a derived work. All edits and imports from other sources create (and publish, of course) a derived work. Although running praxis on other wikis, simply copying and pasting contences from other wikis is not in accordance with the Creative Commons Attribution ShareAlike License, in either version. We habe to attribute the source work and its authors.
[edit] Why RDF
The MW software knows 4 way of author attribution:
- Version list
- RDF (if enabled)
- Footer of each article (if enabled)
- Talk page of the article
I think, it would be almost impossible to insert the required attribution info into the version list. For each source used, we had to include all of its authors into the version list. that makes the list virtually unreadable.
Displaying Attribution on the page footers looks straight forward, but it isn't. Gathering the information is time consuming and unneccessary in most cases. Mostly, readers just want to read or only scan through the article. Evan argues that Wikitravel pages were ready to print and go. That's not true in the exact meaning of the license. If you really insert all authors of source works, the list becomes unaccepably long very fast. Furthermore, then you also had to give attribution to the images used on the pages.
Talk pages are floating objects. Somebody comes along and deletes attribution info accidently, others don't realize, problem! Furthermore, it's very hard to dig through all the stuff written on talkpages when looking for attributions. Imagine, someboy else wants to use our articles (and we hope, that will happen). Then, he needs attribution information.
RDF is maschine readable and kind of standarsized. If you need attribution info, query for the RDF/XML and you get all you need. If you just want to read the article, it won't bother you. Even talk pages remain free of "holy" content the must not be deleted in any case because otherwise, we had done a copyvio.
[edit] Why not RDF
Because CC has messed it up. The intension of RDF was to define machine readable standards. Some organisations like W3C or Dublin Core have developed their vocabulary in different name spaces. CC has used dc vocabulary for their own definitions, but in a horrible way. At the other hand, DC has missed to give sharp definitions, they just had complained after, but CC didn't correct their own, obviously wrong, use of dc vocabulary.
The problem for users is that now, nobody knows which so-called standard to use. The CC mailing list didn't respond on my request.
Well, at the end, I have to state, RDF has become ambigious, but it's still the best choice of all evils.
[edit] Technical Solution
There is no out of the box solution, but there is an attempt by hansm. It is a basic outline of a Mediawiki extension. I would like to get it productive until we really do the fork, although there will be a lot of nice features that must be left out for now.
[edit] Where is it
The extension is GPL licensed. You can download the sources from our subversion repository directly, either just for an up to date download or by pointing your svn client to the URL http://svn.wikivoyage.org/svn/RDFAttribution. If you would like to commit your improvements, drop a message to User:Hansm in order to get a username and a password.
You also can try a test implementation. Try it also as Bureaucrat with login "Admin" and the password "admi?" (guess the last letter). But please, do not use this wiki for productive purposes! Everything will be deleted, either by intruders or by me.
[edit] Database Layout
The Information for source attribution must be stored somewhere in the database. I have created two more new tables, 'srcwork' and 'revsrc'. All information about a source work that is important for attribution, including all its authors, is stored in a row of revsrc. So, one row for each source. The link between revisions of a page and the used source is made via revsrc. We need two tables since one source can be used in several articles and vice versa, one edit of an article can use more than one source.
[edit] Displaying Attribution
There is a rather hidden standard feature of the MediaWiki software: action=creativecommons. For CC licensed wikis, this feature is enabled by default. (BTW, it was Evan who has implemented it.) In detail: if you append the key-value pair action=creativecommons to the querry part of an articles URL, you get displayed author attribution in RDF/XML format. Evan has continued to develop it, but not as part of the MW main distribution. We know the result of his development: his RDF Special.
In my extension, I did an almost complete rewrite of his RDF output, including the information about source works and its authors.
[edit] How to Attribute
Assumed, you have edited some article and included some paragraphs of an other CC by-sa 1.0 licensed wiki. Save the edits and now give attribution. How? Well, this could be more userfriendly in the future, but for now, other things have higher priority. Do the following steps:
- Compare your edit to the previous version of the article. Go there either via the versions list or via clicking on (diff) in the Recent Changes list.
- There is an extra link next to the edit link of the newer version. It is labeled "attribute". Click it.
- You get the new special page "Special:Attribute".
- Give your attribution here. There are 3 passible ways, described below.
[edit] Using the Attribution Special Page
There are 3 ways:
- Write your own RDF information and copy it into the textarea on the top. Give a comment, explaining what part of the source you have used and how. Hit the "attribute" button.
- If you know the URL of the RDF/XML information, you can copy it into the input line in the middle (not into the textarea and neither into the comment input line, but right below it). Hit the "import RDF" button next to the input line. The RDF/XML is download from the given URL and displayed in the textarea. Please check it. If OK, hit the "attribute" button.
- If your import is from wikitravel.org, it's the easiest to choose the language and copy the name of the article in the lower input line. Hit the "import RDF" button next to the input line. Now again, the RDF/XML is download from the given URL and displayed in the textarea. Please check it. If OK, hit the "attribute" button.
Attributing is the most critical feature of my extension since many things may go wrong. There is only very basic error checking implemented.
[edit] Recent Attributions
Similar to the Recent Changes special page, but much simpler in implementation, there is a new special page called Special:RecentAttributions. You get it via Special:Specialpages.
[edit] Importing Article Revisions with included RDF/XML
Importing is a standard MW feature reserved for Sysops and Bureaucrats, only. It is the opposite to the Special:Export what can be used by all users. When you export a page, you get the article in XML format. Since also RDF uses the XML format, it is easyly possible to embed RDF info in the exported XML. But in the MW standard implementation, the Special:Import won't accept XML with embedded RDF. My extension has manipulated the parser in order to read out RDF and save the relevant parts in the database. Importing pages in XML with embeded RDF attribution is the Sysops's way to attribute many articles at once.
[edit] What to test
I'm very interested in feedback on my extension, preferably technical aspects. At the moment, the most important are ideas about the database layout, the overall design of the approach and security aspects. Less important things are datail about html or all the nifty fine tuning. That's still much too early.
[edit] Important for me
- Is this way of attribution acceptable and appropriate to the license?
- Do we have the right information at the right time at the right place?
- How about database layout? Time consuming SQL querries? Breaking standards for tables or querries?
- Ideas on the RDF slang used. Is there really such an arbitrariness of vocabulary or was I unable to find the right documents?
- Security holes. It took me more than one day to find out how to put the apache into a chrooted jail. But I use the Machine for other purposes, too, and was too afraight of intruders. Especially the Special:Attribution page is dangerous. There are many user defined parameters that are used in SQL querries and RDF download calls.
[edit] Links
- http://wiki.creativecommons.org/Web_Integration_Guide#Licenses_are_described_by_their_characteristics.2C_which_come_in_three_types
- http://dublincore.org/documents/dcmi-terms/
- http://wikitravel.org/en/Wikitravel_talk:Language_version_policy#Copyright_between_versions
- http://mail.wikipedia.org/pipermail/wikitech-l/2004-April/022036.html
- http://sites.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/ - A PHP package for creating and interpreting RDF
- http://www.w3.org/RDF/Validator/

