This is a working paper, uploaded 28th August 2011, intended in due course for online journal publication. It is likely to be taken down from this location on publication; but publication details will thereafter be found at www.alarichall.org.uk, and Alaric will leave some forwarding note at this URL. As you'll see, there are various loose ends and slightly lame diagrams (and Alaric ought to do more secondary reading---suggestions welcome!). But if you have any views on the paper, Alaric would really appreciate hearing them! alaric@cantab.net. And feel free to circulate the link. Groove on!

Making Stemmas with Small Samples: Testing the Stemma of Konráðs saga keisarasonar, and New Media Approaches to Publishing Stemmas

Alaric Hall, University of Leeds

Abstract

With relatively few scholars and a large number of texts and manuscripts whose textual history is unknown, Icelandic literature needs to develop efficient ways of establishing stemmas, to facilitate the study of Icelandic literature, linguistics, scribal culture, and so Icelandic history more generally. Meanwhile, in saga-studies as in stemmatology generally, there has been little discussion of the role of sampling in textual criticism, even though most scholars must make heavy use of it. This article tests the viability of creating a stemma using a small sample of text by independently drawing a stemma of Konráðs saga keisarasonar, whose stemma was previously established by Zitzelsberger, and testing it against Zitzelsberger's. Although the approach has limitations, the results are nonetheless very similar to Zitzelsberger's; in some cases they allow us to correct his work; and in the worst cases cases produce known unknowns which can then be resolved through targeted interventions. The article capitalises on internet publication rigorously to include all underlying data and to experiment with new, more transparent, ways of publishing stemmas. It concludes by sketching what the stemma of Konráðs saga can tell us about Icelandic scribal culture during its long post-medieval history.

Introduction

The problem: drawing stemmas transparently, verifiably, and efficiently

Old Norse studies enjoy a significant place in the history of textual criticism: the earliest known stemma was drawn for the Old Swedish Västgötalagen (Collín and Schlyter 1827, table 3; cf. Robins 2007, 93--94), while some of the pioneering work on computer-assisted stemmatology was undertaken on the Old Norse poem Svipdagsmál (Robinson 1989a, b; Robinson--O'Hara 1996). The construction of stemmas to ascertain which manuscripts of a text were copied from which other manuscripts has historically been associated with identifying or reconstructing the earliest version of the text, and postmodern approaches to medieval textuality rightly question these goals (see for a survey Bordalejo 2003 XXXXX). However, a stemma can instead be understood to describe `not a state, but a historical process' (Hanna XXXXX, 116): stemmas are a vital tool for understanding how a text developed and was transmitted---who copied from whom, how, when, where, and why. Despite this, as Driscoll has pointed out, there has been little explicit discussion of the methods for establishing stemmas for Icelandic sagas (Driscoll forthcoming; Almenn rit 839.6309 Cre eða SÁM Aðalsafn 819.09 Cre; an important exception is Hast 1960, 8--13). It is characteristic of Old Norse editorial practice that when Peter Robinson, one of the leading figures in computer-assisted stemmatology, co-edited the poem Sólarljóð, the edition made no mention of his use of computer-assisted methods in establishing its textual history (Larrington--Robinson 2007, 291--94).

This paper contributes to remedying the lack of methodological discussion on Old Norse stemmatology by undertaking what is to my knowledge the first independent verification of a full saga-stemma, while working to establish transparent, verifiable and efficient methods for studying sagas' stemmas. The transparency and verifiability of my work arises from a couple of simple innovations which capitalise on electronic publication: access to all of my primary data is integral to the article, while I have also worked to integrate primary data into stemmas themselves for easy checking. Efficiency is a less common goal to be stated in work on stemmas. The most exciting recent developments in stemmatology have using computer analysis to help textual critics do their job better (for historiography, see Bordelejo 2003 XXXXX; Robins 2007), and in computer-assisted stemmatology so far, `better' has meant more thoroughly and accurately:

one may summarize the key characteristics of the `New Stemmatics' as follows: analysis aims to obtain as comprehensive a view as possible of the relations among the witnesses; analysis is based, as far as possible, on all data; quantitative tools, typically computer-based, are employed in the analysis. Barbara Bordalejo (get better refXXXXX)
This makes sense, since comparison of large datasets is something which software is much better at than people. This article takes a different perspective, however. Pre-twentieth-century Icelandic literature is a field with a very large number of texts surviving in large numbers of manuscripts, transmitted scribally from as early as the twelfth century to as late as the twentieth. Meanwhile, interest in the post-medieval life of medieval Icelandic sagas is growing rapidly (cf. Driscoll XXXXX), both provoked by and in turn encouraging recent leaps forward in the digitisation and dissemination of Icelandic manuscripts, key examples being the ongoing digitisation and free-access publication of material via the Medieval Nordic Text Archive, Sagnanet, its successor handrit.is, and the related Stories for All Times project. Working out how to map efficiently the complex scribal traditions to which our surviving manuscripts attest would be of great benefit to this emergent field, transforming our understanding of what individual manuscripts mean as cultural and linguistic evidence---and potentially, at a methodological level, to medieval studies more generally.

My central methodological concern, therefore, is with sampling in stemma-making, and what size sample we need to take from a saga reliably to establish its stemma: these considerations underpin my approaches to the stemmas I construct (and also raise the seldom-noted prospect that short texts may contain too few variants ever to enjoy a reliable stemma). It is important to be clear about my limitations. As Cisne, Ziomkowski and Schwager have pointed out, `philologists reconstructing ancient texts from variously miscopied manuscripts anticipated information theorists by centuries in conceptualizing information in terms of probability' (2010, 1). The tools available for assessing probability---whether mathematical techniques or software to implement them---have advanced dramatically since the days of Carl Schlyter and his more famous contemporary Karl Lachmann. Philologists' training in mathematics and computing, however, frequently has not; and this is certainly so in my case: I am not, in short, qualified to address sampling in stemma-making in the way that this deserves. However, past scholars in the field have generally avoided this issue simply by concealing their use of sampling: despite their limitations, then, my exploratory discussions of problems in stemma sampling should be a step forward, underpinning and provoking future discussion and interdisciplinary collaborative research.

As far as I am aware, no romance-saga stemma has ever been independently tested, so although my primary aim was to test my methods against a reliable stemma, the comparison also provides, in general, an independent assessment of previous work, leading to some possible corrections.

The choice of case-study: Konráðs saga keisarasonar

The saga which this article uses as its case-study, Konráðs saga keisarasonar (the saga of Konráður the Emperor's son), is an Icelandic prose romance from around the fourteenth century (XXXXX). Building on the work of Cederschiöld (1884, clvi--clxxiv) through a series of articles leading up to his meticulous 1987 edition, Zitzelsberger constructed one of the best documented stemmas of an Icelandic saga (1980, 1981, 1983). (The main rival among non-translated Icelandic romances is Slay 1997.) So-called medieval popular romance of the kind represented by Konráðs saga has recently been attracting growing interest across Europe (see Driscoll 2005; Hall et al. 2010, 56--58), and as a genre well represented in post-medieval Icelandic manuscript production it is an ideal basis for a case-study in tracing scribal transmission. Kalinke and Mitchell identified fifty-one vernacular romance-sagas composed, translated or transmitted in Iceland and known to have existed in medieval manuscripts (1985). Rendering their survey as a spreadsheet facilitates an overview of the scribal tradition (previously available only through the case-study by Glauser 1994a, 1994b; see further Hall et al. forthcoming): the sagas survive across a total of 811 manuscripts, each saga surviving on average in 30 manuscripts---and this will be an underestimate, due to occasional omissions and a trickle of further manuscripts coming to light since 1985. The survival of romances composed in Iceland follows a markedly different trajectory from those which were translated, possibly reflecting a generic distinction running along similar lines to a distinction between `popular' and `highbrow' romance. Manuscript survival increases until around the Reformation, at which point it dips, followed by a steady rise into the nineteenth century, and rapid decline in the twentieth.

Graph showing numbers of riddarasaga manuscripts over time
Figure 1. The number of manuscripts containing romance-sagas, by century (including only the 729 in Kalinke and Mitchell 1985 which can readily be assigned to a particular century). The .csv file on which this graph is based is here.
Konráðs saga is for most of its history slightly better attested than the average Icelandic romance-saga (it survives in a total of 48 manuscripts and fragments), but is clearly a good representative of the overall trajectory of survival.
Graph showing average numbers of riddarasaga manuscripts over time
Figure 2. The average number of manuscripts containing romance-sagas, by century (including only the 729 in Kalinke and Mitchell 1985 which can readily be assigned to a particular century). The .csv file on which this graph is based is here.
The scribal tradition represented here is not, generally, one of literatim copying: while outright rewritings are rare, scribes invariably made deliberate as well as accidental changes to the text of their exemplars as they copied. This has important consequences for the kinds of methods which are appropriate in constructing stemmas, particularly in reconstructing their earliest sections, from which the manuscripts have usually been lost. It is seldom possible to identify 'better' or 'worse' readings which indicate that one manuscript (the 'better' one) has priority over another (the 'corrupt' 'worse' manuscript). Priority can often only be assigned on the grounds of manuscript date.

For the thirty-five or so romances which were composed in Iceland, a century of study has produced more or less complete and explicitly argued stemmas for four:

A further three traditions have received more cursory surveys: This is a not insignificant achievement, but is still only a fifth of the corpus. While an understanding of a saga's transmission which maximises completeness and accuracy would of course be valuable, a method which allows us with reasonable confidence swiftly to survey the terrain and focus our future efforts seems a more practical ambition---and would also be useful for the thousands of other medieval texts whose transmission is only vaguely understood.

Theoretical questions: text sampling and stemma presentation

Past work on text sampling

The key question for efficient stemma-making is what kind of sample of a text is needed to make a stemma of a given reliability, a question which has enjoyed remarkably little study. The editors of the romance-sagas mentioned above do not discuss sampling, but they cannot have considered every single variant in every manuscript of the sagas they studied: the task would have been huge. They must have sampled the texts---perhaps methodically, perhaps haphazardly. One of the few explicit stemmatic discussions in saga studies is by Sture Hast, in his analysis of the Íslendingasaga Harðar saga. Hast drew his stemma on the basis of sample passages, ignoring variants which he considered likely to represent independent innovations producing the same reading (1960, 8--11). Thus sampling of two kinds has tacitly been fundamental to saga-editors' stemmas, and no doubt many others': the sampling of sections of texts, and the sampling of kinds of variants. I discuss text-sampling in this section and variant-sampling in the next. Counting in the standard edition (Þórhallur Vilmundarson and Bjarni Vilhjálmsson 1991, as digitised by Sæmundur Bjarnason et al. n.d.), Hast himself used sections of 1940, 2016 and 1431 words from, respectively, the beginning, middle and end of Harðar saga (the saga itself is 19,109 words long, so in total Hast sampled about 28% of the saga). The samples are coterminous with folios in the principle manuscript, but beyond that I am not aware that there was any particular rationale for Hast's choices.

In the only experimental study of sampling in stemmas of which I am aware, Spenser, Bordalejo, Robinson and Howe examined fifty-eight manuscripts and early printed editions of Chaucer's Miller's Tale. They tested the similarity of stemmas based on samples to stemmas based on the complete dataset (itself tested using bootstrap analyses; 2003). They divided the complete dataset (by my count, 5275 words in the Ellesmere manuscript) into 3958 `characters' (sites of possible textual variation, usually one or two words), 1540 (39%) of which turned out to show `parsimony-informative' variation (variation of a kind suitable for textual criticism). They found that `even with the smallest subset size [123 characters, of which 48 were presumably parsimony-informative], the stemmata are more similar than would be expected by chance' (2003, 413) and concluded that samples larger than about 1000 characters (of which 390 were presumably parsimony-informative) made little difference to the stemma: in this dataset, sampling about 1300 words would be sufficient to create a stemma almost as reliable as one based on the whole text. Thus Spenser, Bordalejo, Robinson and Howe hint at an order of magnitude for reliable sampling: around 1300 words, or around 390 parsimony-informative characters.

It is important to note that I say `1300 words' rather than `25% of the text'. This is because, a priori, it seems likely that a scribe will generally introduce changes to a text at a rate unrelated to the overall size of a text. One can of course envisage various possible reasons why scribal variation might be related to text-length: a scribe aiming for an exact copy of a text might copy a short one diligently and accurately but flag while copying a longer one; a scribe aiming to revise a text might alter a short one in detail but a long one more lightly. But it seems a reasonable hypothesis that a sample that is big enough to establish the stemma of a text of, say, 2,000 words in a given scribal culture should also, by and large, be big enough to establish the stemma of a text of, say, 20,000. (It also follows, incidentally, that some texts in some scribal cultures---perhaps sonnets for example---may inherently be too short ever to enjoy a reliable stemma: a fundamental methodological problem which would bear further investigation.) If the figure suggested by the Miller's Tale is at least in the right ballpark, this hints that Hast analysed about four times as much text as he needed to to establish a reasonably reliable stemma for Harðar saga.

The comparison with the Miller's Tale is, of course, merely a hint. The Miller's Tale is a very different text from an Icelandic prose romance, not least because it is in verse; and the scribal culture of late medieval England, which included a measure of secular, commercial, urban production (see XXXXX), was quite different from any of the various scribal cultures which produced Icelandic romances during their long scribal transmission (see XXXXX). To emphasise some of the technical complications which arise:

Meanwhile, the proportion of a text which is sampled may be important in other ways. For example, the sampling might miss sections where the scribe used a different exemplar; or the samples might capture multiple exemplars, but with too little text from each different exemplar for reliable analysis; and in the case of fragmentary manuscripts, the fragment may contain no passage corresponding to the samples at all. The historiography of the infamously problematic transmission of Piers Plowman exhibits numerous examples of sampling proving insufficient (Brewer 1996, c. 261--71XXXXX). Piers is exceptional, however---it contrasts, for example, with the relatively straightforward transmission of the Canterbury Tales; and past work suggests that, whether from a lack of choice or simply from preference, Icelandic scribes seldom conflated exemplars (e.g. Zitzelsberger 1980, 183).

Despite the methodological probems, then, it is clearly worth at least exploring the possibility that relatively small samples of saga-manuscripts can produce relatively reliable stemmas. It is important to recognise in so doing the difference between sampling which provides inaccurate data (which would be a serious problem) and sampling which provides incomplete data (which is a loss to knowledge but not actually misinformation): if a saga used a different exemplar for a passage which fell outside the samples, that detail would not be recorded, but the identification of the exemplar of the sampled passages would still be correct. Likewise, texts which survive only in fragmentary form and which omit the passages sampled represent known unknowns: their omission from the study may be regrettable, but can form the basis for targeted future research rather than indiscriminate whole-text sampling. My approach here, then, works on the principle that omissions (rather than mistakes) caused by small samples are methodologically acceptable, as long as the researcher is conscious of limits of the information, and sensitive to the possibility that anomalous looking texts might represent poorly sampled texts from multiple exemplars.

Presenting the stemma of Konráðs saga keisarasonar

Zitzelsberger's stemma of Konráðs saga keisarasonar can be represented using a conventional dendrogram as:
Zitzelsberger's stemma of Konráðs saga
Figure 3. Zitzelsberger's stemma of 'Konráðs saga'. The stemma in Zitzelsberger 1981, 168, omits Cederschiöld's explicit use of MS B (Holm 7 fol) which I have accordingly added in. Texts omitted from my own study are coloured grey. The .dot file on which this image is based is
here, and the postscript file here. XXXXXdo better imageXXXXX
In former days, producing a tree-diagram like this for publication was a laborious task---so much so that although saga-editors might describe manuscript filiation verbally, they often chose not to present an actual stemma. Times have changed, and software (in this case Graphviz and online publication make the use of images easy. However, online publication also opens up new possibilities altogether. Zitzelsberger's stemma can also be rendered in the form of nested HTML lists. (It ought to be added here that I created this rendering of Zitzelsberger's stemma after I had completed my own, independent version, so the act of producing this stemma did not influence my own work.) I decided that it would make too much of a mess to include it in the main text, so open it with this link here. In some ways this format is primitive, and less clear than a tree diagram. Significantly, it cannot readily represent texts with multiple exemplars (like F, and Cederschiöld's edition): these simply have to be included in more than one branch of the stemma, with a note to this effect. However, the format has advantages: it makes it easy to integrate, as I do, key details about the manuscripts, rather than merely call numbers or sigla, and it is possible for the user to hide or reveal branches in order to make the stemma more manageable (and, through their web-browser, to adapt its appearance in other ways to maximise accessibility). [Help! If the editors will give me a hand with this...XXXXX] Where possible, manuscripts are also linked to their online records and digitisations at handrit.is. At the time of writing, this resource is far from complete (and it is not clear how stable the URLs will prove to be), but the hyperlinks at least indicate the possibility of more sophisticated and better integrated stemmas in future. Perhaps most importantly, however, it is possible, by clicking on a call number, for the user to see text samples, and (in bold type) the changes which they show from the parent manuscript. Where the lost manuscripts have been posited, I have also reconstructed their text---not in the belief that the reconstruction is completely accurate, but as a tool both for the clear expression of the arguments implicit in the stemma, and as a means to ensure rigorous analysis on my own part. The reader is not, however, bound by the structure of the stemma: it is fairly easy to compare samples from anywhere in the stemma, making it easier to check for alternative filiations (er, especially if I work out how to let users hide branchesXXXXX). In the case of my own stemma of Konráðs saga, presented below, these samples represent the exact evidence on which the stemma is based. In printed texts, this evidence would normally be present, if at all, in the form of lists of representative readings, which readers can generally only interpret with the greatest concentration. Although still far from perfect, the stemma here takes a step towards a more intuitive and transparent mode of publication, and will hopefully provoke debate and experimentation with other innovative methods.

Sampling Konráðs saga and testing Zitzelsberger's stemma

My aim was to test Zitzelsberger's stemma by creating my own independently. Where Zitzelsberger's work (1980, 1987) provided transcriptions of manuscripts, either directly or through his critical apparatus, I used these. Repeating Zitzelsberger's manuscript sigla for convenience, these manuscripts were:

Otherwise, I sampled all those manuscripts held in public collections in Iceland or available in facsimile there at the Stofnun Árna Magnússonar. This led to seven omissions, all marked in grey on the stemma above: Zitzelsberger filiated most of these as the ends of branches; if he was right, their omission from my survey should not have had a major effect on the filiation of the versions surveyed. Thus a total of 41 texts out of Zitzelsberger's 48 were ultimately included in my study.

I took two samples from each manuscript, one of 112 words from the beginning, and one of 205 from the end (counting in Cederschiöld 1884). These choices were made largely on pragmatic and intuitive grounds, based on prior work on saga-stemmas; experimentation with the (admittedly very different) data provided by Roos and Heikkilä (2009; http://www.cs.helsinki.fi/u/ttonteri/casc/); the work of Peter Robinson (particularly the article discussed above); and the identification of clear narrative units that were fairly likely to be stable in transmission. Thus the first passage runs from the beginning of the saga to the end of the description of Jarl Roðgeir; the second passage runs from the description of the descendants of Konráður and Matthildur to the end. Taking text from the beginning and end of the saga made it much easier to define these passages, though this convenience comes with costs: openings and closings are especially likely to contain formulaic language, and might therefore be prone to common innovations in different parts of the stemma, while conventional wisdom has it that scribes are more likely to behave in uncharacteristic ways at the beginnings and the ends of texts (LALME XXXXX).

Ideally, the transcriptions on which my stemma was based would have been undertaken according to the exacting standards of the Medieval Nordic Text Archive (Haugen 2008), which besides rigorous accuracy would have the advantage of laying the foundation for a diachronic text corpus which could possibly be used for linguistic research as well as stemmatology. However, constraints on time prevented this. (A more experienced palaeographer, a more fluent speaker of Icelandic, or a better typist would doubtless have worked quicker; once I was familiar with the text, an easily legible manuscript like AM 524 4to, illustrated in Figure 4, would take me half an hour to transcribe; my 33 transcriptions, totaling around 13,000 words, took around 27 hours in total.) My transcriptions therefore expanded abbreviations (italicised in the original transcriptions but rendered in roman type in the HTML stemma for ease of reading). My expansions did not always rigorously take into account the precise spelling conventions of a manuscript (e.g. whether Old Norse -ir was rendered -ir or -er). Nor, while making a general effort to represent the manuscript forms, did I agonise over the transcription of ambiguous letter forms where these made no difference to the sense---for example, whether a letter was better to be transcribed as i or í, ij or ÿ, ö or ő; ǫ and were both transcribed as ǫ. Missing or illegible letters are represented with $, while uncertain readings are marked with [?]. Inevitably, I formed an initial impression of the manuscript filiation while transcribing; and I often identified and checked possible transcription errors while subsequently analysing the manuscripts' filiation. In practice, therefore, transcription and analysis of texts were to some degree recursive rather than sequential processes.

AM 524 4to, p. 105sample manuscript transcription
Figure 4. Sample manuscript transcription: AM 524 4to. The original transcriptions are available in their original .doc format and as HTML.

Methodology

Software analysis

At the beginning of this project, I had hoped that phylogenetic software, of the kind which is now being used widely in stemmatics, would provide a sufficiently reliable method for stemma construction that human brainpower could largely be devoted instead to interpreting the data provided by the stemmas. In the event, as I discuss below, it became clear that human input dramatically improved the reliability of analyses; but the use of software was nonetheless an intrinsic and important part of the process of human analysis. This section discusses the degree of success achieved through the software analysis. Since these processes are still relatively unfamiliar to most humanities scholars, I give a fairly detailed, step-by-step account of my methods, giving access to representative versions of the files I used while the work was in progress. These files are the latest versions used, but are provided for illustrative purposes, rather than as finalised research outputs in their own right.

The most popular software in the field of stemmatics is currently PAUP*. Heikkilä and Roos developed an algorithm for scoring the similarity of two stemmas and tested the stemmas produced by PAUP* and a range of other software against constructed textual traditions whose true stemmas are known. Roos and Heikkilä's main tradition involved a 1,200-word text with 67 manuscripts (of which 37 were made available for analysis, the others representing lost texts; 2009). Unlike the Canterbury Tales Project, Roos and Heikkilä assumed that spelling variation was parsimony-informative; in their mark-up, there were 1209 characters, all considered parsimony-informative, with an average of 3.8 variants per character. To humanities academics familiar with UK university grading scales, their scoring system will be uncannily familiar: 100% is a perfect match; 75% represents the best achieved in practice; 60% is about average; and 50% is the kind of score that is achieved simply through random inputs (2009, 422 table 3). Tested against Roos and Heikkilä's dataset, PAUP*'s parsimony program emerged with 74.4%. However, I chose the popular phylogenetic software Pars, part of the Phylip package (XXXXX). This software has not, as far as I am aware, previously been tested in stemmatics, but my own experiments using it with Roos and Heikkilä's dataset suggested that it could be as powerful as PAUP*. The attraction of Pars over PAUP* was that it was free, open-source, both downloadable and accessible online, and readily available not only for proprietory operating systems but also for the open-source operating system Linux, none of which presently holds for PAUP*. The main disadvantage is that whereas PAUP* can handle up to sixteen different character states, Pars can handle only eight, which meant that the data had to be divided into more characters than was sometimes convenient (and that Roos and Heikkilä's own encoding of their data could not readily deployed to test the software, because they used too many character states).

Thirty-five of the transcriptions were made into a spreadsheet of aligned readings, giving 98 characters. Variants were numbered, with an average of 6.6 variants per character (thus 652 different variants in total). I aimed to capture all lexical variation, but no spelling variation, while, for efficiency of alignment, maximising the number of variants per character. The decision not to use spelling variation was based on the assumption that, whereas in literatim copying, spelling variation can be important, spelling would be too susceptible to independent common innovations to be useful for stemmatic analysis in the scribal tradition of Icelandic sagas. However, future experimentation might prove the usefulness of spelling variation.

image of variants spreadsheet
Figure 5. Spreadsheet of aligned readings. The .csv file on which this image is based is here.
Variant alignment was a laborious process (taking about 10 hours) but was not only necessary for Pars analysis, but also invaluable as an aid to subsequent human analysis of the data. (Again, this was a recursive process, with initial analyses undertaken before all the manuscripts had been transcribed; in the event, some corrections, and the last seven manuscripts, were never included in the spreadsheet. It proved quicker to filiate these final manuscripts simply by human analysis, a process faciliated by the fact that in some cases their existence had already been predicted by earlier analyses.)

The spreadsheet was then converted to a file formatted for Pars analysis.

Pars infile
Figure 6. Infile for Phylip Pars analysis. The file on which this image is based is here.
This was then run through Pars, using the factory settings, and an unrooted stemma was produced through the Phylip programme Drawtree.
unrooted Pars/Drawtree stemma
Figure 7. Unrooted Pars/Drawtree stemma of 35 texts of 'Konráðs saga'. The .ps file on which this image is based is here.
It does not require a long perusal to conclude that this is unlikely to be an accurate stemma: the earliest manuscripts (generally those with labels beginning in Holm and AM), for example, appear in two widely separated groups and never appear as exemplars of other manuscripts. I subsequently quantified the similarity of this stemma to Zitzelsberger's using Roos and Heikkilä's algorithm (2009: 420-21): I encoded the Pars stemma and Zitzelsberger's stemma as appropriately formatted .dot files; manuscripts not present in the Pars stemma filiated by Zitzelsberger at the end of branches were not included in this encoding of Zitzelsberger's stemma. (The files are available here as zitzelsberger_stemma_35_MSS.dot and phylip_stemma.dot, with .ps visualisations respectively here and here). I applied the algorithm using Roos's C program Rankdistance. The Pars stemma scored 62%: better than random, but not enormously. Although, as I argue below, the low score can partly be accounted for by errors in Zitzelsberger's stemma, it will be clear that different approaches would be needed to produce a reliable stemma using a computer alone. However, the Pars stemma, like the spreadsheet underlying it, was nonetheless invaluable to my own subsequent analysis: it shows a number of clear clusters, and while the precise internal relations within these were often implausible, they provided reliable starting-points for grouping manuscripts which needed to be filiated.

The value of minor variants

While my disregard for spelling variation was conventional, my decision to capture all lexical variation was unusual. As in Hast's work, discussed above, it is usual to exclude so-called minor variants---words which are liable to show independent common innovation. These are often function words such as the relative particles sem and er, or words which are more or less in free variation. For example, most versions of Konráðs saga begin with a statement along the lines of Það er upphaf þessarar frásagnar ('it is the beginning of this narrative'), but the word for narrative varies fairly freely, alternatives to frásögn (genitive singular frásagnar) being saga and frásaga (along with a few manuscripts where this character is not represented at all). Meanwhile, the end of each saga explains that Konráður had children with a sentence like þau Konráður og Matthildur áttu tvo sonu ('Konráður and Matthildur had two sons'), but the verb used varies fairly freely between eiga (past 3rd person plural áttu) and geta. The following stemma is Zitzelsberger's, but for each manuscript which I surveyed, the stemma shows the readings for these two characters.

stemma showing variants on frásögn and eiga
Figure 8: Zitzelsberger's stemma, showing variants on 'frásögn' and 'eiga'. The .ps file on which this image is based is here.
Unless Zitzelsberger's stemma is badly wrong, or fails to recognise extensive conflation of exemplars, this demonstrates fairly free variation in Icelandic between the terms studied: for example, geta becomes eiga and eiga becomes geta independently at several points in the tradition. Traditionally, these variants would therefore be discarded as text-critical evidence. But a glance at the diagram also shows that a scribe is still considerably more likely to copy his exemplar than to switch word. Thus the balance of probability is that one manuscript using the verb eiga was copied from another using the same verb. Taking my spreadsheet of 35 texts, I used Pars to construct a stemma of Konráðs saga using only these two variants:
Phylip Pars stemma showing variants on frásögn and eiga
Figure 9: Pars stemma constructed using only the characters 'frásögn' and 'eiga'. The .ps file on which this image is based is here.
A casual comparison of this stemma with Zitzelsberger's shows that it has mostly grouped the manuscripts in the same way at Zitzelsberger did. Encoding the stemma as a .dot file (available here as a .ps visualisation) and scoring it against my 35-text version of Zitzelsberger's stemma gave the fairly respectable result of 60.6%. This indicates the value of minor variants: far from being a distraction, an accumulation of minor variants all pointing in the same direction can become a powerful argument for a particular manuscript filiation. The score also hints that software analysis of a very small number of even relatively uninformative characters can provide nearly as useful a basis for subsequent human analysis as the laborious encoding and analysis of a hundred or so.

My own analysis of the data, building on my Pars analyses, and involving the construction of a draft HTML stemma which ultimately became the basis for the one presented above, took into account manuscript dating (since an older manuscript cannot be copied from a younger one) along with an ability to make subjective judgements about the likelihood of one reading producing another (taking account, for example, of the likelihood of misreading, or the distinctive limitations imposed on a copyist by an omission in his exemplar). This made the analysis much finer than Pars's.

Similarities and differences between my stemma and Zitzelsberger's

The outcome of my filiation of the texts of Konráðs saga was the following stemma; filiations and reconstructed manuscripts which differ from Zitzelsberger's are marked in purple, while manuscripts omitted from my analysis but included in his are included, in their appropriate positions, in grey:
Alaric's stemma of Konráðs saga
Figure 10: My stemma of Konráðs saga, based on the transcribed passages. The .dot file on which this image is based is here, and the postscript file is here. XXXXXsort out AM 180bXXXXX
I have not created a finalised HTML version of my stemma, but a final HTML stemma incorporating what I believe are corrections to Zitzelsberger's stemma, and therefore reflecting several points of my stemma, is available here.

My stemma and Zitzelsberger's are fundamentally similar: rendering them as appropriately formatted .dot files, removing those manuscripts not included in my survey (Zitzelsberger's stemma available here, mine here), my stemma scores 87% against Zitzelsberger's. This provides independent verification of Zitzelsberger's own work, and, in turn, indicates that the small samples with which I worked did not prevent the production of a stemma similar to that achieved by traditional (albeit unstated) methods. I focus here, therefore, on discussing the differences between my stemma and Zitzelsberger's.

  1. The top of Zitzelsberger's stemma is much more complex, and no doubt accurate, due to his (and before him Cedersciöld's) fuller sampling of fragmentary manuscripts.
  2. Zitzelsberger tentatively filiated Lbs 679 4to and ÍB 277 4to as descendants of Lbs 1654 4to, whereas I opted (presumably wrongly) for Lbs 1687 8vo, again through insufficient sampling.
  3. Zitzelsberger filiated Lbs 2462 4to and its descendants as descendants of Lbs 1654 4to, but this seems to have been a mistake: they are more closely related to its sibling ÍBR 5--6 fol.
  4. Zitzelsberger inferred a lost manuscript *d between AM 5--6 fol and Rask 31 4to. He gave no evidence for this, but it seems to me that *d must have existed, and was in fact the parent not only of Rask 31 4to, but also of those manuscripts which Zitzelsberger saw as descendants of Rask 31 4to.
  5. In a number of cases, Zitzelsberger and I make different inferences about the existence of lost intermediaries (labelled *b, *f, *o, *p, *q, and *r).
I now discuss each of these divergences in turn.

1. The top of the stemmas

At the top of the stemma, my sampling was too limiting: because most of the earliest manuscripts of Konráðs saga are fragmentary, with either the beginning or the end missing, there was too little overlap to produce a reliable stemma, or in some cases any rational stemma at all. Stockholm perg 7 4to (Zitzelsberger's A), of which the end survives, and Stockholm perg 7 fol (Zitzelsberger's B), of which the beginning survives, were arbitrarily rooted as separate descendants of a lost original, though in theory B, the later of the two manuscripts, could have been rooted as a child of A. AM 529 4to, which offers a fragment only of the first section sampled, I did not try to filiate at all. The fragment AM 567 4to (Zitzelsberger's F) was rooted as a descendent (rather than a nephew) of A, and the parent (rather than the sibling) of Stockholm perg 6 4to. It was, however, self-evident that the sampling of these fragmentary manuscripts was insufficient, and that the top of my stemma could be no more than a working hypothesis from which to develop a more reliable stemma on the basis of targeted transcription. Though no doubt wrong, my stemma did produce known unknowns.

The top of each stemma
Figure 11: The tops of my stemma and Zitzelsberger's. XXXXXsort out AM 180bXXXXX

2. The filiation of Lbs 679 4to and ÍB 277 4to

While easily recognising the close relationship between the manuscripts Lbs 679 4to and ÍB 277 4to, which share a very large number of innovations, my sample was insufficient to place these in the stemma, or to be sure whether one was copied from the other or whether they derived independently from a common ancestor. It was evident that they were broadly more similar to the tradition descending from Stockholm, Royal Library, perg 7 fol than the one descending from Stockholm perg 6 4to, but distinctive readings within this were hard to find. The closest distinctive comparisons which I found in my sample were with two innovations attested in Lbs 1687 8vo: all three begin Ríkharður hefur keisari heitið, and all three introduce Jarl Roðgeir in a distinctively similar way: Lbs 1687 8vo says `Jall var i Rikinu er Rodgeir hét', while Lbs 679 4to and ÍB 277 4to give Jarl var í Ríki keisara er Roðgeir hét. However, while suggestive, these similarities could easily have arisen independently; moreover, Lbs 1687 8vo is probably later (c. 1850) than Lbs 679 4to and ÍB 277 4to (both 1834 or thereabouts); moreover, Lbs 679 4to records that its exemplum was written in 1750 (Zitzelsberger 1981, 172 n. 22). With a fuller survey, Zitzelsberger was instead able to associate them (somewhat tentatively) with the other branch of the tradition stemming from Stockholm, Royal Library, perg 7 4to, and with Lbs 1654 4to and the almost identical Lbs 272 fol in particular (1981, 160--61).

As with the top of the stemma, the problems arising here from sample sizes produced known unknowns: it was clear that further, targeted research was required reliably to ascertain the texts' positions in the stemma. The small sample did not produce seriously misleading results, but rather would have facilitated further research.

3. The filiation of Lbs 2462 4to

A major difference between my stemma and Zitzelsberger's was in the filiation of Lbs 2462 4to. Zitzelsberger simply said `2462 (1801) and its copy, 623, derive from 1654', unfortunately offering no rationale for this (1981, 162). My sample, however, associates Lbs 2462 4to with Lbs 1654 4to's sister-manuscript ÍBR 5--6 fol. These two possible exemplars are very similar, but where they differ, and Lbs 2462 4to has an informative reading, it is closer to ÍBR 5--6 fol. The manuscripts can readily be compared using either the HTML version of Zitzelsberger's stemma or my final HTML stemma, but it is worth quoting:

Lbs 1654 4to: og er ei nefnd kona han[s]
ÍB 5--6 fol: & er drottning hans eige nefnd
Lbs 2462 4to: og er ej Drottníng hans nefnd
Lbs 1654 4to: eptir hanz daga vard Heinrekur Stőlkőngur
ÍB 5--6 fol: eftir hans daga vard heinrikur son hans stölkongur
Lbs 2462 4to: Eptir hans daga vard Hinrik son hans stólkongr
More specifically, Lbs 2462 4to shares distinctive readings with the *c branch descending from ÍBR 5--6 fol. To give the most important examples, these manuscripts call Konráður's sister Similia instead of variants containing v, principally Silvía; they call Roðgeir an ágætur jarl instead of a göfugur jarl; they denote the languages which Roðgeir speaks with tungumál instead of tungur; they omit the following description of his learning; and they re-order the description of Konráður's engraving of the elephant-leg. More specifically again, Lbs 2462 4to shares a few further readings only with ÍB 224 8vo: for example, Konráður's son Henríkur owes his wisdom not to his father, but to his mother; snakes have not only eaten the people of a city, but lagst so á gullið (`laid themselves upon the gold'). It is hard to believe that these similarities are a fluke reflecting a small sample, and accordingly I have filiated Lbs 2462 4to and ÍB 224 4to as the children of a lost common ancestor *r. Zitzelsberger must have made a mistake. Indeed, some confusion regarding this branch is also evident in his claim that `1785 (1833) and the badly tattered 1217 (1817) are direct copies of 152 and 224, respectively' (1981, 162): as Zitzelsberger's discussion and stemma show, he meant that Lbs 1785 4to was a copy of Lbs 224 4to, and Lbs 1217 4to a copy of Lbs 152 4to, and my analysis agrees with this. Zitzelsberger's notes may have been incomplete or simply have included some mistakes regarding this section of the stemma. It seems likely, then, that Zitzelsberger's filiation of Lbs 2462 4to with Lbs 1654 4to was simply an error.

4. The lost manuscripts *b, *f, *o, *p, *q, and *r

A general difference between my stemma and Zitzelsberger's is that---setting aside our necessarily different handlings of the top of the stemma---we often handle lost manuscripts slightly differently.

Zitzelsberger sometimes inferred a lost manuscript simply on the grounds that dramatic changes between an exemplar and a copy are best accounted for by a damaged or illegible intermediary copy. This may sometimes be true, but it reflects an assumption which runs through Zitzelsberger's stemmatic work that copyists ought to try, and were trying, to copy literatim. Zitzelsberger originally argued for a lost manuscript *f between AM 179 fol and Lbs 2115 4to because Lbs 2115 changes Roðgeir from a berserkur to a bartskeri (barber-surgeon; the example conveniently falls within my transcribed sections). Zitzelsberger, seeing this as a mistake, could not see how it could have arisen from the clearly written AM 179 fol and so inferred a lost manuscript. But I see no reason why this should not be a deliberate change, and omitted *f from my stemma. (In the event, however, Zitzelsberger's later discovery Johns Hopkins 9 4to seems to have proved the existence of *f for other reasons [1983, XXXXX], so I have retained it in the final stemma). Likewise, we can see the changes between Stockholm, Royal Library, perg 7 fol (Zitzelsberger's B) and AM 118a/119a 8vo (Zitzelsberger's b) simply as rewritings---at least in the passages I have transcribed---without needing to posit an intervening damaged or illegible *b, as Zitzelsberger did.

In some cases, however, I posit more lost manuscripts. This suggests, essentially, that I am less likely to ascribe similarities between manuscripts to independent common innovation than Zitzelsberger was. This might reflect the experience of systematically reconstructing lost texts in my HTML stemma: this method of working encourages the identification and encoding of shared variants. The reasons for my reconstructed manuscripts can be seen easily through my final HTML stemma and the reader will be able to judge for themselves whether the variants demand these reconstructions.

In one case, Zitzelsberger explicitly discounted the possibility of a lost manuscript where I accepted it. Regarding the relationship of AM 118a/119a 8vo to its descendants ÍBR 5--6 fol and Lbs 1654 4to, Zitzelsberger wrote (1981, 158) that they

must derive independently of each other either from 118a/119a itself or from a lost intermediate copy of the latter. For the second possibility there is no firm evidence: whatever variants 5 and 1654 introduce are minor and apparently spontaneous.
However, it seems clear from my sampling that ÍBR 5--6 fol and Lbs 1654 4to show seven common innovations, each quite small but collectively striking. In theory, Lbs 1654 4to could be the exemplar of ÍBR 5--6 fol, but it was copied two years later. We must, therefore, take Zitzelsberger's second option, reconstructing a lost manuscript source for ÍBR 5--6 fol and Lbs 1654 4to. This may represent a case where the detailed focus on a small sample, rather than a more cursory examination of a bigger sample, encourages a more rigorous assessment of the evidence.

Zitzelsberger inferred a lost manuscript *d between ÍBR 5--6 fol and Rask 31 4to simply because he perceived many mistakes in Rask 31 and though it unlikely that the scribe of that manuscript would have introduced them. He then saw Lbs 998 4to as a direct descendent of Rask 31. I would not have inferred *d on the grounds the Zitzelsberger did; but did perceive a case for Rask 31 and Lbs 998 4to each descending from a lost common ancestor. The case is slight, however, and more analysis would be needed to confirm the filiation.

On the other hand, my inference of the manuscript *p, on subsequent checking, proved unnecessary. We all make mistakes.XXXXXsay this more pretentiouslyXXXXX

For many purposes, it is not very important whether one manuscript is the exemplar of the other, or whether they are both descended from a lost common ancestor. And there will always be circumstances in which this cannot be discovered either way. My detailed analyses and reconstructions of lost texts do suggest the power of such close readings for helping us to identify the existence of lost manuscripts; but of course my samples are much too small to make any claim to comprehensiveness. My methods will in some cases identify lost manuscripts, but there are many occasions where they will not. One advantage of a detailed stemmatic analysis involving analysis of full texts might be that it would give us an accurate sense of exactly how many lost manuscripts can be inferred through stemmatic methods, helping us to address the vexed question of what proportion of our saga manuscripts actually survive.

Conclusions

1. A new stemma of Konráðs saga keisarasonar

Combining Zitzelsberger's findings with my corrections, I suggest this as the current best stemma of Konráðs saga:

Final stemma of Konráðs saga
Figure 13: A revised stemma of Konráðs saga, based on the transcribed passages. The .dot file on which this image is based is here, and the postscript file is here.
As mentioned above, this is available in HTML format here.

2. Methodological developments

Zitzelsberger did not discuss his stemmatic methodology in detail, and if all that my arguments here achieve is to provoke explicit debate and interdisciplinary research on sampling in stemma studies, then to my mind that will represent significant progress. But Zitzelsberger's work is presumably representative of other scholars who have worked within the intellectual community surrounding the Arnamagnæan manuscript collections. I have tested Zitzelsberger's stemma of Konráðs saga keisarasonar against a small, clearly defined, and published sample of data. Our results are very similar. Since our stemmatic analyses were independent, this is an encouraging sign that our methods are at least consistent and rigorous. The similarity also implies one of two further conclusions; it is hard to know which is right, but both are encouraging in their different ways. Either

  1. Zitzelsberger worked with large samples, but small samples can produce the same results; or
  2. Zitzelsberger tacitly worked with small samples, and this paper has made clearer the prospects and limitations of this method.
Either way, the finding is that stemmas based on small samples (around 350 words, or around 100 characters at around 6.5 variants per character) produce results very similar to what we are already accustomed to in the field.

There are, however, moments in my analyses where the small size of the samples makes it hard confidently to filiate manuscripts. This is particularly important when the fragmentary state of manuscripts means that too little of their text is sampled. However, in this particular case-study, it was at least apparent when the data was too sparse for confidence: in these cases, the research produced known unknowns, which could facilitate targeted and efficient further investigation.

There are also aspects of my analyses where I have found a case for arguing that Zitzelsberger was wrong. This is salutory: we all make mistakes in textual filiation, but it is rare that any scholar returns to check the laborious stemmatic work of a predecessor. However, the innovations in this article at least point the way to modes of publication which will make it more likely that other scholars will check our work and spot our mistakes. Combining the findings of Zitzelsberger and myself, the most likely stemma for Konráðs saga keisarasonar seems likely to be the following. XXXXX

Relatively small samples are not a panacaea for swift stemmatic analysis. My transcriptions took around 27 hours, and my encoding of a large proportion of this data for computer analysis around 10. The subsequent human analysis of this material, however, was still a long and gradual process. But an effort to analyse complete texts, in the manner of much current work in computer-assisted stemmatology, would have taken much longer. The saga, in Cedersciöld's edition, is around 15,000 words long, so---barring relevant advances in optical character recognition---completing and encoding the transcriptions undertaken here would have taken perhaps forty times longer than it did: getting on for a year's full-time work.

3. Directions for future analysis

As I discussed at the beginning of the article, the ultimate point of establishing the complete stemma of a text's surviving attestations is to facilitate the analysis of the texts themselves---whether from literary or linguistic perspectives---and the analysis of the society which produced the texts. Simply as a pointer towards the kinds of study of post-medieval Icelandic scribal culture which a fuller understanding of manuscript transmission can produce, I glance here at the spatial distribution of the manuscripts of Konráðs saga.

The information of library catalogues, supplemented and amended by Zitzelsberger's publications and handrit.is, allows us to localise the copying of twenty-three Icelandic manuscripts, mostly descendants of Stockholm, perg 7 fol (Zitzelsberger's B). Taken on its own terms, the distribution of these manuscripts shows a weighting towards the north and west which is well attested, albeit as yet little understood, but does not tell us much more.

The distribution of Konráðs saga manuscripts
Figure 14: The distribution of Konráðs saga manuscripts. The .kml file on which this image is based is here.
By breaking this data down in terms of manuscript relations, however, more patterns emerge. Mapping only those manuscripts near the top of the stemma---copied before 1700---the following distribution emerges:
The distribution of early Konráðs saga manuscripts
Figure 14: The distribution of pre-1700 Konráðs saga manuscripts. The .kml file on which this image is based is here.
This distribution precisely fits the patterns of manuscript production identified by Peter Springborg for what he termed the 'renaissance' in Icelandic manuscript production from around the 1630s (1977). At this time a tight-knit group of powerful and well educated men, inspired by European humanism and the hunger of the Danish and Swedish kingdoms for historical legitimation of their imperialism, began enthusiastically copying and comissioning copies of medieval sagas. They were mostly associated with Iceland's two episcopal seats, at Hólar in the north and Skálholt in the south; but included magnates in the Westfjords, particularly at Vigur; and also included Þorsteinn Björnsson, working in Útskálar on the Reykjanes peninsula. These sites were distant from one another, but linked by frequent long-distance contacts between wealthy Icelandic scholars.

However, the patterns of Icelandic manuscript production seem to have changed markedly around the end of the seventeenth century (for case studies for later Icelandic manuscript production see Glauser 1994a, 1994b; Driscoll 1997; Vachunova forthcoming; Davíð Ólafsson 2009). One indicator of this is the size of manuscripts, a characteristic which has been analysed in detail for Hrólfs saga kraka by Vachunova (forthcoming). Since most library classmarks indicate whether an Icelandic manuscript is folio, quarto, or octavo, my spreadsheet of romance-saga manuscripts, introduced at the beginning of this article, enables us roughly to trace the overall trajectory of manuscript-size for romance-sagas (scoring folios as 2, quartos as 4 and octavos as 8, so that a low score represents a large manuscript; cf. Hall et al. forthcoming):

Graph showing average size of riddarasaga manuscripts over time
Figure 15. The size of manuscripts containing romance-sagas, by century (including only those in Kalinke and Mitchell 1985 which can readily be assigned to a particular century and whose size is known). The .csv file on which this graph is based is here.
The huge sixteenth-century dip in the size of manuscripts of translated romances is an outlier: the century is represented by only one, octavo manuscript. In general, it is clear that the translated romances, while undergoing a similar overall trajectory to Icelandic romances, sustained a larger format throughout the period, presumably reflecting their higher status. Within genres, however, manuscript size is fairly stable until the eighteenth century, when it declines. Whatever its motivation---likely causes include cost and portability, as manuscript production became more widespread among and more closely associated with poorer sections of society---this new trend for smaller manuscripts indicates a shift in scribal culture. This shift is consistent with new patterns in manuscript distribution.

Some nineteenth- and twentieth century manuscripts, while fairly closely related textually, are widely dispersed in space. This is true of the three localisable manuscript copies of Gunnlaugur Þorðarson's 1859 editio princeps of Konráðs saga (two of which are by the same man, Magnús Jónsson í Tjaldanesi), which presumably reflects the broad distribution of the printed text (mapped in white); and Lbs 3933 8vo and ÍBR 43 8vo, related to one another rather distantly (mapped in green):

Dispersed distribution of late Konráðs saga manuscripts
Figure 16: Dispersed distribution of nineteenth- and twentieth-century Konráðs saga manuscripts. The .kml file on which this image is based is here.
Without further investigation, it is hard to know what to make of these distributions, but they are if nothing else different from the seventeenth-century pattern of scribal centres.

Clear distribution patterns emerge, however, for the eighteenth- and nineteenth-century manuscripts descending from AM 118a/119a 8vo (Zitzelsberger's b), a large proportion of which are localisable. From around the epicentre of Hólar in Eyjafjörður, Konráðs saga came to the Westfjords in the seventeenth century, apparently as a descendant of AM 118a/119a 8vo. This text was copied in manuscripts which have been associated with the region's great aristocrat, merchant and scholar Magnús Jónsson úr Vigi, Lbs 1654 4to and ÍBR 5--6 fol:

Focused distribution of late Konráðs saga manuscripts
Figure 16: Focused distribution of eighteenth- and nineteenth-century Konráðs saga manuscripts. The .kml file on which this image is based is here.
The descendants of these manuscripts all appear in the same north-western region of Iceland, indicating a relatively local distribution. Within the region, more local patterns again are discernable, with the descendants of Lbs 1654 4to (mapped in purple) appearing in Skagafjörður and the descendants of ÍBR 5--6 fol (mapped in red and orange) clustering around the Dalir. The closely related group of Rask 31 4to, Lbs 998 4to and JS 632 4to (mapped in orange) form a particularly tight distribution. It seems that, at least in the north-west, we can see the seventeenth-century pattern of long-distance manuscript transmission between a small number of scholarly centres being superceded by close-knit, local networks.

Needless the say, the patterns identified here demand further exploration. One avenue for this is more detailed research into Konráðs saga manuscripts: further work to localise manuscripts, and to trace their movements after their production, in combination with prosopographical and social-network analyses of their scribes and readers. However, the patterns can also be explored by expanding our dataset to include other stemmas of other sagas---many of which will co-occur alongside Konráðs saga in the manuscripts already mapped here. The methods which I have outlined here will allow us to achieve this more efficiently, and more swiftly.

Works cited

References can be followed up at http://www.alarichall.org.uk/teaching/alaricsnotes.php.