Notes and reporting by Joe Hourclé, joseph.a.hourcle@nasa.gov
This is a detailed summary of the RDAP Summit; a reduced summary will be available in the ASIS&T Bulletin. You can also find the snippets that other people felt were significant by searching Twitter for the tag '#RDAP11', and the presentations from the meeting at http://www.slideshare.net/asist_org/
Anything in square brackets [] are my notes, such as a few points at which I had missed taking notes (because I was commenting myself, or a few times when I was sending e-mail, if there was something mentioned that I was passing along to my co-workers).
Most citations and references are also in square brackets.
Gary Marchionini kicked off the Research Data Access and Preservation (RDAP) Summit with the lessons of what was learned from last year, reminding people to submit comments on ways to improve, and to use #RDAP11 in twitter. He discussed the goal of building community, and how the issues of data management have gotten more visibility recently. [Science 2011 Feb 11 issue : http://www.sciencemag.org/site/special/data/]
Clifford Lynch from the Coalition for Networked Information gave the keynote, in which he discussed how the library community and institutional repositories (IRs) fit into the management of research data. IRs' focus on long term preservation of deposited objects made them useful to scientists looking for places to store and distribute their data, particularly for the smaller data collections that are frequently forgotten as people look to dealing with 'big data'. He discussed the five aspects of the NSF data management plan requirements:
He discussed some of the flaws of IRs for data storage, such as the need for specialized scientific metadata,, the uncertainty of the cost of storage, and the occasional need for complex authorization rules, and the uncertainty of the cost of storage; but he also suggested solutions such as separating the scientific data cataloging from the IR, considering some parts as 'services' that could be contracted out or done through consortia or through endowments, and the need to focus on the older, smaller data sets that are more at risk without getting distracted by the more complex and difficult collections that are not a good fit for traditional IRs.
Some significant issues were brought up by the audience, such as how to measure the importance of the data, and issues of entropy as the value often decreases over time. [Bill Michner et.al, 1997, Nongeospatial Metadata for the Ecological Sciences. Ecological Applications, 7(1), pp. 330-342 ; http://www.esajournals.org/doi/abs/10.1890/1051-0761%281997%29007%5B0330%3ANMFTES%5D2.0.CO%3B2] to which Cliff discussed how the issues were economic, technical, and ethical; the costs of storage vs. re-running and experiment wouldn't work if future experimental procedures might make past trials obsolete, and you have to weight the cost running the experiment if might put lives at risk. Also discussed were how IRs fit into scientific workflows, issue of data encumbered by regulatory and other restrictions, and funding issues of government agencies who run discipline repositories who can't receive NSF funding if they take the money.
Jonas Dupuich from Berkeley Electronic Press, Katherine Kott from the Stanford Digital Repository and Terry Reese from Oregon State University presented on the current use of IRs to expose research data. Jonas discussed how there were three common approaches, all of which stored metadata about the research data, but also either (1) a link to the data, (2) a guide to obtaining and using the data, or (3) the data itself. There were a number of advantages to the second approach, as the textual nature of the documentation helped to make the object findable to search engines. Katherine discussed how they used a re-writing of Ranganathan's five laws to evaluate their IR, and how that drove their latest implementation along with details on services they currently perform. Terry Reese benefits of IRs by making objects easier to access by other researchers while providing download metrics to the depositor but discussed problems that research data causes for IRs, but how modifications to DSpace's storage provided for better performance and the alignment with their storage needs. Terry also discussed the results of their interviews with researchers, and their interest in training on metadata and help with other aspects of curation even if they didn't want the library to take over the storage of their data.
During audience comments, there was discussion of the inability to assign structured metadata at the collection level with OAI-PMH to more easily enable groups building scientific catalogs the ability to discover which collections they might wish to harvest. There was also discussion of the types of content that went into user guides for the data, when it was the object serving as a proxy to the data. There mention of limitations of using the same license for both data and publications, the need for varying embargo lengths, and the inclusion of policies on attribution, all of which were covered again during Mackenzie Smith's talk. Another insightful comment was that there are many fields that don't consider themselves to have 'data', but they have many 'files' such as documents or images that serve the same function and could be described as the 'data' in their research. There was also a discussion of how faculty in different fields had different attitudes in being connected to their advisee's theses.
Eric Chen from Cornell, Andrew Sallon from the University of Virginia and Mackenzie Smith from MIT discussed cross-discipline groups that had formed at their institutions to advise researchers on NSF data management plan (DMP) requirements, and the new issues that surfaced. Eric discussed Cornell's effort to determine the needs of their research community, and even provided the data from their surveys, showing that most people suspected that they would need some help with DMPs; he described their solution of a 'concierge service' [http://data.research.cornell.edu] to receive questions from researchers and direct them to different people from the libraries, research computing, or a specific research department, but also brought up new issues, such as how to deal with physical samples of 'data', and which processed forms of the data needed to be preserved. Andrew described an effort at the University of Virginia with an even broader group including an attorney to look into policy and ownership issues, and their template system for NSF DMPs based on the UK JISC's DMPonline, developed in collaboration with a number of other groups. [DMPonline, http://www.edlib.org/uc3/datamanagement/dmpo.html]. Mackenzie discussed the importance of policies, and pointed out that two of the five NSF guidelines related to policy: (1) access and sharing, and (2) re-use and derived works. She discussed some of the technical, social and legal issues, including confusion of who 'owns' the data, applicability of foreign laws in international collaborations, issues with copyright of data in the U.S., enforceability of 'data use licenses', and some of the problems with using some of the Creative Commons [http://creativecommons.org/] licenses with data. She mentioned the need for further work on issues concerning licenses, attribution, persistant identifiers, provenance, metadata and registries. She also enumerated some of the motivators for researchers, including credit, control, confidence that their data isn't mis-used, intellectual property rights, funding, easier re-use of their data, easier discovery and access of other researcher's data, easier integration and interoperability. She discussed how researchers wanted advice, but weren't interested in the complex details; she recommended standardizing and establishing best practices on copyright and waivers, terms and conditions of data use and re-use, and language for policies and licensing, as well as developing good examples of boilerplate. She also raised the rather interesting point that metadata is a type of data, and it's possible for it to have different policies and licenses than the data which it describes.
A number of questions were raised in the open discussion, such as how to deal with people coming for help at the last minute, and the general advice was to make sure they hit the five basic areas of the NSF DMP requirements, but to make sure that they are educated about where to go for help the next time; there was one suggestion that we look into either tying into existing IRBs or setting up something similar so that contact is made with the researchers earlier in the proposal process. There was also discussion of how to fund these efforts; the general consensus was to treat it as overhead, although there may be specific cost-recovery models for storage as a service.
Ruth Duerr from the National Snow and Ice Data Center, Phil Bourne from Protein Data Bank and Steve Hughes from the California Institute of Technology presented on the science data efforts, and shared various recommendations and insights from their decades of experience in this field. Ruth talked about how NSIDC [http://nsidc.org/] provides standard ways to cite data as if it were any other publication, but the lack of a requirement by scientific journals to require that citation in the published research. She discussed efforts from ADS [http://adsabs.harvard.edu/] and arXiv [http://arxiv.org/] to link articles to the data, and how the lack of metadata affects the ability to find, select, obtain, understand, and use the data. She reminded us that data files are not like a book, but as more like a page or even sentence from a book: how the single data file might be meaningless without the associated context, and how there should be metadata assigned at both the file and collection level, and the appropriate level of records returned depending on the context of the request. Phil described how the Protein Data Bank (PDB) [http://www.pdb.org] is a community effort, and how the scientists successfully lobbied their discipline's journals to not accept papers without the data in the PDB. He mentioned that the issues were much harder and took longer than they had originally though; the political issues took more time than solving the technical issues. Phil also mentioned how the PDB's role had shifted from being an archive to include both analysis and education; they tie more tightly with the analysis tools to allow someone to explore the data in much richer ways than would be available from just reading about it in a journal article. He discussed the need for changing staff as their mission changed, and that outreach was now their most significant part; he also mentioned that review and correction of existing data consumed one quarter of their budget. Phil discussed the demand for better performance, web services, widgets, and annotation by creating linkages within the data, within the literature, and to connect the two. He mentioned how they had multiple interfaces for different classes of users, including customizable and mobile interfaces, and their work on improving speed and accuracy of deposition by improving the interfaces. Steve discussed Open Object Database Technology (OODT) [http://oodt.apache.org/], a flexible repository architecture that he helped build for NASA's Planetary Data System (PDS) [http://pds.nasa.gov/], but is now also used by NIH's Early Detection Research Network (EDRN) [http://edrn.nci.nih.gov/] and managed by the Apache Foundation [http://www.apache.org]. He discussed the trends in eScience towards highly distributed, loosely coupled, federated systems with complex modeling, and the need to support both varied data analysis and decision support tools; Steve also talked about how OODT's modular design allows different groups to swap out the data model, security, discovery tools or other components to support their specific needs.
Audience discussion for this session focused on the success of these systems; both metrics for gauging success, but also the contributing factors that led to the success of these efforts; one of the significant contributing factors in the space and earth sciences was the CODMAC report [http://www.nap.edu/catalog.php?record_id=12343], which decided that the scientist should be responsible for their own data, and their self-organizing to develop PDS and other systems. It was also mentioned that researchers are simply not trained to provide metadata.
Arnold Rots from the Harvard/Smithsonian Center for Astrophysics, Joey Comeaux from the National Center for Atmospheric Research, Jay Hnilo from the National Oceanographic and Atmospheric Administration (NOAA) and Dan Kowal, also from NOAA, spoke about data management from the view of federally funded archives.
Arnold presented on the Virtual Astronomical Observatory (VAO) [http://www.usvao.org/] and its role of federating search across data systems from U.S. observatories and space missions, and its participation as a member of the larger International Virtual Observatory Alliance (IVOA) [http://www.ivoa.net/]. He mentioned that many of the issues mentioned earlier weren't a problem in their field, as they had standardized on file formats, didn't collect personal information, and the data wouldn't lead to patents; many of the standards and general analysis tools had been developed by IVOA, solving the interoperability issues in their field. Arnold talked about the varied groups within VAO, including user support, operations, data curation & preservation, education and public outreach, and technical assessment. He described their relationships with ADS [http://adsabs.harvard.edu/] and expanded on ADS's efforts of using data identifiers to provide semantic linking between data and publications [http://adslabs.harvard.edu/semantic/publications.html]. Arnold also explained that due to there being multiple processed forms of the data, instruments such as Chandra are generating data publication volumes five to six times greater than the amount of data originally collected by the instrument. He raised interesting points about the challenges of getting the community to release data, the cost and conflicts from institutional requirements for IT standards and security, the certification of trusted digital repositories, and the need to archive not only the data but also the software and operating systems that might be part of the workflow to create the data.
Joey discussed the need for stable funding to retain good staff and provide services such as user support and software maintenance and development. He also raised the point that it's imposible to document everything; there are times when you just have to consult a human expert, but their experience was that it took five to 10 years for the necessary expertise. Joey discussed NCAR's evaluation of the reasons for data loss, and although natural disasters and hardware failure were in the list, aspects of bad curation, such as loss of metadata, accidental overwrites and simple lack of sufficient information on the value of the data were also a problem. He spoke of the need for selecting good archival data formats that were documented at the byte level and were not dependent on specific software, hardware, or operating systems.
Jay discussed the NCDC's National Climate Model Portal [http://ncmp.ncdc.noaa.gov/], and revealed that until 2002, there had been no practice of long-term archiving of climate data. He discussed the need for reducing the size of the data though subsetting and downscaling to provide for interoperability and reuse [http://www.reanalysis.org/]. Jay also described their formal requirements of a submission agreement, and their processes for determining what to archive. (ATRAC, [https://www.ncdc.noaa.gov/atrac/index.html]) Dan went into more detail on appraisal process that occurs even before the submission agreement, and also the process to review and 'sunset' data rather than keeping all data in perpetuity. He discussed the need for communicating the value of various metadata fields to the people making data systems understand the need for populating those fields to promote re-use. Dan also raised the issue of how to communicate with the researchers, the need to review the return on investment of data rescue, and how to deal with data nominated for archiving by someone other than the people who currently maintain that data.
The open discussion included a question on how long it took to respond to requests for their data to be archived, to which we were told that the full NOAA process could take more than a year, but could move faster for data at risk of being lost. There was also a question about what type of people are needed to curate the data; typically, the staff are research domain experts who are trained in data management, metadata standards, curation or whatever additional skills that might be needed, but because they are tightly affiliated with the research field, they might get distracted by the exciting new data and neglect the older, less exciting but equally valuable data.
The final session of presentations by Micah Altman from Harvard, Eliot Metsger from Johns Hopkins University, Monica Omodei from the Australian National Data Service and Reagan Moore from the University of North Carolina focused on specific technologies to help manage archives. Micah discussed the DataVerse network's [http://www.thedata.org/] SafeArchive [http://www.safearchive.org/], a layer over LOCKSS [http://lockss.stanford.edu/] to verify that there were sufficient number of copies of files on the network, and should it be necessary, initiate copies at other archives with sufficient resources to provide compliance with TRAC [http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying-0]. Eliot discussed the current plans of the Data Conservancy [http://dataconservancy.org/] to abstract many of the policy rules to allow embargos, logging, authentication & authorization, or obfuscation of the data. He explained the need for obfuscating of the data through 'fuzzing', where the information was made less precise to allow for data re-use without revealing sensitive information. Monica presented on the efforts of the Australian National Data Service [http://www.ands.org.au/] to promote sharing and re-use of the data, and the systems they have developed to track the existence of data in a registry, provide for discovery of that data, assignment of DOIs for data, and provide vocabulary services. She also described other external efforts that had been funded to establish IRs and registries, software tools for data integration, and how they had partnered with other agencies within Australia for name authority, but also some of their problems in dealing with metadata catalogs being distributed as PDFs or spreadsheets, and the inability to track specific records within those files. Reagan discussed the need to not only track data, but also the purpose, properties and policies of the data; he talked about data grids and other generic infrastructure that can be used to provide data virtualization.
The audience discussion included how to deal with policies that aren't enforceable as part of the system, such as attribution; how to deal with conflicting policies, such as different requirement for embargos; and the use of the tools and techniques described for the automatic sunsetting of data.
After the last session of prepared talks, Gary Marchionini a discussion of the issues that keep us up at night; Reagan Moore provided his grouping of the issues that had been collected during the first day:
| What keeps ... up at night? | me | the field |
|---|---|---|
| Infrastructure | 17 | 26 |
| Management | 23 | 12 |
| Researchers | 15 | 14 |
| Description | 12 | 13 |
| Personnel | 5 | 7 |
| Funding | 5 | 4 |
| Appraisal | 4 | 4 |
[] [do we have the whole set of answers to link to?]
There was surprise that management was so high on the list; that category included issues related to the need for public outreach and the need for centralized management rather than many stovepipes. Outreach had been mentioned by many of the presenters, but we had a more in depth discussion of what exactly was meny by 'public outreach', and if it meant the need to advertise our services and values to the scientists and researchers whom we serve, or to the general populace as a whole. It was noted that we need to reach the general citizens to explain the value of curating data for the advancement of society, as they help to set priorities for funding. The citizens rightfully have questions about what researchers and faculty do to earn their salaries, and we need to explain their benefit to society by expanding knowledge, and the need for data for evidence-based policy making.
We discussed the need for a reversal of funding for public infrastructure, particularly as it affects these data efforts, and the need for all people involved in these efforts to join in to explain the importance of curating this data. There was also a discussion about making data both easier to find and more usable to the general community; although for the original investigation, the more highly processed data is not useful, there is a lot of data out there that someone might be interested in. Although there were concerns about spending too much effort on preparing data for public use, as it might not fit within an organization's funding mandates, there was a suggestion to better track press releases and the visualizations of data used in them to help find both the professional and popular websites that are making use of them.
There was also surprise that funding wasn't seen as the largest issue, as if funding is cut, the data effectively evaporates. There was a discussion of the need for a review of the economics and sustainability, cost models, and of inter-institution collaboration. This was later expanded by discussing the whole socio-political aspects, and the need for better stories of the benefits of good data practices that we can use to bolster continued support from management and our funders.
We discussed what other communities we needed to reach out to and work with. IASSIST, CODATA, AAAS, ACRL, ARL eScience, DCC and DCMI-SAM were all mentioned, and it was noted that this meeting conflicted with a meeting for ACRL. We discussed the need to involve the national agencies, various discipline focused communities of data and metadata managers, science data librarians, the digital libraries federation, as well as other digital preservation and infrastructure efforts. There was a question if we were duplicating efforts, but it was decided that there was a need for a forum such as RDAP to discuss the cross-discipline issues, and a need for many perspectives for this effort to be successful. We were tasked with sharing our trip reports and notes widely, so that we can help to inform others of these efforts.
We also discussed the issue of reduced budgets, and the need for ways for other people to participate and be informed without attending the meeting in person; hopefully, this summary is a small start, but live streaming of next year's meeting was mentioned as a better alternative. There were also valid points raised about the need for this to be more than just an annual meeting, and for this to be an ongoing conversation. The RDAP mailing list [http://mail.asis.org/mailman/listinfo/rdap] hosted by ASIS&T was one place to discuss and engage with the participants from this and last year's RDAP summit.
We ended the summit with a panel to discuss the panel of future digital libraries. We discussed what roles digital libraries could be doing, and some of the items mentioned, included indexing of materials on the web, developing documentation standards, natural language processing, compiling lists of external authoritative resources, and of course, the management of data collections. Expanding on the concept of data management, we discussed the need for data indexing, analysis services, interfaces to allow data manipulation, and the need for multiple presentation interfaces for different user communities.
We discussed the accomplishments of IBM's Watson on Jeopardy at performing natural language processing to search a knowledge repository, and if digital libraries could or should evolve into knowledge repositories. It was pointed out that a good librarian needs to be an applied epistemologists and understand how the knowledge is being generated, and what the products of that knowledge are.
We were reminded that digital libraries are not a specific implementation, but sets of services to interact with items in the library's collections. To survive, a library needs to be flexible and adapt to changing needs, sufficient funding, and to meet the needs of at least one user. We were also reminded that there are many different communities that could make use of the data; although it may have been produced for scholarly research, it has uses in education and public outreach, by the media, and by policy makers.
We discussed how libraries are used to dealing with special collections, and data is just yet another type of special collection. Just as with other artifacts in special collections, we need the context and other information; simply storing the bits representing the data is not enough to make the data usable, particularly over the long term.
One goal stated was to make items in scientific data formats more findable than Google. There were also multiple suggestions to consider the social aspects of the data: allowing users to provide input about the data or their experience with the data, so we can track the multiple uses of the data or its suitability for different purposes as a form of peer review of the data and the repositories. There were questions raised about how to feed into a recommender service and could it be used to identify communities. Rather than build these into each repository, there was a suggestion to build a 'Data Thing', in the model of Library Thing, to allow the community to tag or otherwise annotate the data, but for this we need DOIs or other identifiers.
There was discussion of capturing the whole conversation around the data, both through tools that allowed people to visualize the data, but also the need for better linkages between data, the documentation of the data, and the scholarly publications.
[and this part below, was typed up more than a month after the meeting; I don't know that it even made it into the initial summary that was then trimmed for space, it might be that I had just extracted the themes without much detail, so here's some more detail, although I admit it jumps around a lot.]
We discussed how companies like Dropbox are trying to satisfy the need for preserving, sharing and managing the large amounts of personal digital records..
As we discussed use vs. re-use, Gary told a story about how individuals might have pictures of their family, but someone else could use those pictures to show how landmarks have changed over time. We discussed how there was a significant need to look at the issues of re-use for the future.
We discussed how right now, the focus was on IRs and digital libraries, but some felt that the future was moving towards community repositories and [something I didn't write down]. There was a comment on how community repositories are the new public libraries; we should look at how similar libraries manage to gain funding, and at other sorts of communities that self-assemble to solve problems. We also discussed what sort of system were needed; if data portals were enough, or we needed community and social systems.
There was a question raised about what curation processes the libraries should be doing; should they be reprocessing in an attempt to find new relationships, such as what is done with genealogy indexes? Are there standard types of processing that need to be done?.
We discussed how we care about data, and want it to be preserved, such as wanting to keep personal photos, but we also want our privacy, and so there is specific data that we do not want preserved. Some countries are looking to establish laws on this, such as France's 'Right to be Forgotten' [https://www.privacyassociation.org/publications/2010_10_20_understanding_the_right_to_be_forgotten_in_a_digital_world/]. [Note; as it's now May, there's a push in the US to bring a similar right as part of COPA (Child Online Protection Act)].
There was another anecdote about a scientist's passing, and the attempt to inventory the stuff in his office, but no one had any idea if the data he was storing had been used or possibly distributed to others.
There were questions about what the best forums were for collaborating on these topics, and if there were other groups we should be talking to. Groups mentioned included the W3C provenance working group; the archival community, who have experience appraising stuff that gets dumped off with them, and will likely have to deal with data in the future if they aren't already; and anthropologists, where researcher's advisees often become the trustees of their advisor's data. It was also brought up that we need to look beyond just images and text, as there are many different types of data.
There was mention of types of indexing that could be done on a knowledge repository; we could analyze how authors collaborate by incidents of co-citation, and we should look at the ACM and Google Academic and Bing Academic for their citation graphs. There was a mention of a study of mentoring impact from looking at information science dissertations, and the rate of co-citation after their dissertations.
[and then I have a note about 'Exhibit framework' that I have no idea what it means ... it's possibly a reference to the Simile project [http://www.simile-widgets.org/wiki/Thesis:_Creating_interactive_web_pages_using_the_Exhibit_framework], and I had a note that seems to suggest that someone was using it to visualize IRS data (maybe even the IRS themselves)]
There was a question of what level of indexing we needed to do, and yet another reminder of the need to link data to other documents. [and I can't remember if 'level of indexing' related to the granularity of the indexing, or if it meant the amount of detail that we needed to capture ... possibly, it meant both]
Another group that was mentioned to talk to were the Information Architecture community, who were conveniently holding a conference down the hall. [and I talked to quite a few of them, and they seemed interested in the problems we're trying to solve; I think part of it's that they'd just be happy to have some problem other than 'how do we get people to stay on our website and buy more stuff'. I also talked to someone from the National Academy of Science, and the National Academies might be another group we should talk to.]
We were reminded that our greatest asset is the data. [or is it 'are the data'? I personally prefer singular collective, but lots of scientists prefer plural. Yes, I'm typing this while massively sleep deprived for a few days and I might be getting strange, but well, if someone actually finds this after reading this much text, drop me an e-mail]
The trends in data were towards larger volumes of data, with attempts for more generic uses by a wider community. There's also a lot of other types of data, such as qualitative data from the humanities, and image databases.
We came up with a few ideas for possible sessions for next year, including:
There was a question about what happens to data from private industry, such as the pharmacology and biomedical fields. It was also pointed out that research done from grants from private industry might be encumbered by non-disclosure agreements. Companies like Xerox would have done lots of research on toner chemistry.
There was a question about if we had any great examples of data re-use [this might've been a possible session topic?], and I believe this was where I mentioned using SOHO/LASCO images for comet finding [http://sungrazer.nrl.navy.mil/], which was not the original purpose of LASCO [http://lasco-www.nrl.navy.mil/].
We touched on interagency collaboration, [and we might've brought up again the issue of how NASA agencies who might run a discipline archive can't accept NSF money for curating data from an NSF grant], and the issues in dealing with universities, some of whom might be smaller and not have the same infrastructure as the well-funded larger research institutions.
We asked if we should contact the three national libraries as possible collaborators in this effort:
[Note : we might be able to go through FLICC [http://www.loc.gov/flicc/]; I've also seen some of the libraries at various NKOS [http://nkos.slis.kent.edu/] workshops, some of which were hosted at the Ag. Library.]
There was also a question of if we could somehow run a virtual conference, like ACRL was doing, to get more involvement from colleagues without travel budgets or have conflicts. [note: the IA Summit was being (audio) recorded to turn into podcasts, even thought it wasn't being broadcast in real time; this might be a good low-cost alternative to at least get information out, even if it doesn't permit for direct remote participation; also, I believe that Code4LIB [http://www.code4lib.org/] had in the past gotten a sponsor specifically to handle videotaping of one of their conferences; we might also look into if by next year YouTube is allowing the general public to stream live, or if any of the other video sharing sites (Vimeo, Internet Archive, or one of the specialty video sites that have a narrower topic that we might fit into, such as ScienceStage)]
We were reminded that IRs don't need to take on the challenge of the difficult data; they should focus on the easy cases that aren't massive or legally encumbered or have complex authentication or authorization needs, and we can deal with the difficult ones later. They can help by indexing the fact that the data exist, and exposing the documentation for using the data, to make them findable to the greater community. In general, IRs are a better suited to help the 'small science' data.
IRs should not be a cage that holds the data; we need ways to extract the metadata from the IRs so that it can be used by discipline specific search engines that may be searching across multiple institutions.
We need to look at policies, both within our institutions, and within the science disciplines and journals to see how they affect data sharing. We should consider outreach to scientists, the general public, and policy makers on the advantages for scientists to release their data, and look for other ways in which we can help to get more people to release their data to a wider audience.
We should also look at other things our community could do to improve communication between the various communities.
[and then the last few things that were in my notes, I'm pretty sure were just for my own personal notes, and not the main part of the meeting, but ...]
Do we need an acronym / standards lookup? Need to remind speakers of the audience; scientists might not know the library jargon and visa versa. Librarians are unlikely to know about:
[related : I was asked why Steve Hughes corrected himself after calling something an 'ontology', and the issue is that 'ontology' is used by many scientists with a more liberal definition than that used by the library sciences, and is often used for any sort of knowledge organization system]
[end of the bit that was typed up wall after the fact]
There were an amazing number of issues covered in both the presentations and the open discussion, but only a limited amount of space so I apologize in advance if I missed any salient points that you are particularly passionate about. I suggest that you raise those issues on the RDAP mailing list [http://mail.asis.org/mailman/listinfo/rdap], and we look into ways in which we can organize into task groups to try to tackle these many and varied issues.
Thank you to Gary Marchionini (UNC), Erin O'Meara (UNC), Michael Giarlo (Penn State), Bill Anderson (UT Austin) and Reagan Moore (UNC-RENCI) who helped to organize specific sessions, and to the many other ASIS&T members from the special interest groups for both Digital Libraries (SIG-DL) and Science and Technical Information (SIG-STI) who participated in planning and other guidance for this meeting.