2 votes

Comment créer vos propres champs et télécharger des documents dans Apache Solr ?

Je m'amuse avec Solr pour la première fois. J'ai réussi à l'installer et à le faire fonctionner sur Ubuntu Server, j'ai affiché les documents xml d'exemple qui se trouvaient dans le répertoire exampledocs et j'ai pu rechercher des mots-clés comme "monitor", "apple" et "Dell" puisqu'ils sont dans les fichiers d'exemple.

Maintenant, je veux ajouter mes propres documents avec des champs personnalisés. Voici ce qui était présent par défaut dans scheme.xml :

 <fields>
   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a previously defined type from the 
       <types> section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
       Norms are omitted for primitive (non-analyzed) types by default.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector.  
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This 
       will increase storage costs.
     default: a value that should be used if no value is specified
       when adding a document.
   -->

   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />

   <!--
   The following store examples are used to demonstrate the various ways one might _CHOOSE_ to
    implement spatial.  It is highly unlikely that you would ever have ALL of these fields defined.
    -->
   <field name="store" type="location" indexed="true" stored="true"/>

   <!-- Common metadata fields, named specifically to match up with
     SolrCell metadata when parsing rich documents such as Word, PDF.
     Some fields are multiValued only because Tika currently may return
     multiple values for them.
   -->
   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="subject" type="text_general" indexed="true" stored="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="comments" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>

   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

   <!-- catchall text field that indexes tokens both normally and in reverse for efficient
        leading wildcard queries. -->
   <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>

   <!-- non-tokenized version of manufacturer to make it easier to sort or group
        results by manufacturer.  copied from "manu" via copyField -->
   <field name="manu_exact" type="string" indexed="true" stored="false"/>

   <field name="payloads" type="payloads" indexed="true" stored="true"/>

   <!-- Uncommenting the following will create a "timestamp" field using
        a default value of "NOW" to indicate when each document was indexed.
     -->
   <!--
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
     -->

   <!-- Dynamic field definitions.  If a field name is not found, dynamicFields
        will be used if the name matches any of the patterns.
        RESTRICTION: the glob-like pattern in the name attribute must have
        a "*" only at the start or the end.
        EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
        Longer patterns will be matched first.  if equal size patterns
        both match, the first appearing in the schema will be used.  -->
   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
   <dynamicField name="*_t"  type="text_general"    indexed="true"  stored="true"/>
   <dynamicField name="*_txt" type="text_general"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_en"  type="text_en"    indexed="true"  stored="true" multiValued="true" />
   <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>

   <!-- Type used to index the lat and lon components for the "location" FieldType -->
   <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

   <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
   <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>

   <!-- some trie-coded dynamic fields for faster range queries -->
   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>

   <dynamicField name="*_pi"  type="pint"    indexed="true"  stored="true"/>
   <dynamicField name="*_c"   type="currency" indexed="true"  stored="true"/>

   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>

   <dynamicField name="random_*" type="random" />

   <!-- uncomment the following to ignore any fields that don't already match an existing 
        field name or dynamic field, rather than reporting them as an error. 
        alternately, change the type="ignored" to some other type e.g. "text" if you want 
        unknown fields indexed and/or stored by default --> 
   <!--dynamicField name="*" type="ignored" multiValued="true" /-->

 </fields>

et les fichiers d'exemple par défaut ressemblaient à :

<add><doc>
  <field name="id">3007WFP</field>
  <field name="name">Dell Widescreen UltraSharp 3007WFP</field>
  <field name="manu">Dell, Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">monitor</field>
  <field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast</field>
  <field name="includes">USB cable</field>
  <field name="weight">401.6</field>
  <field name="price">2199</field>
  <field name="popularity">6</field>
  <field name="inStock">true</field>
  <!-- Buffalo store -->
  <field name="store">43.17614,-90.57341</field>
</doc></add>

J'ai remplacé les champs du fichier schema.xml par mes propres champs personnalisés :

<fields>
  <field name="user_id" type="string" indexed="true" stored="true" />
  <field name="about" type="string" indexed="true" stored="true" />
  <field name="music" type="string" indexed="true" stored="true" />
  <field name="movies" type="string" indexed="true" stored="true" />
  <field name="occupation" type="string" indexed="true" stored="true" />
</fields>

et j'ai essayé de poster ce document nommé mydoc.xml :

<add>
    <doc>
        <field name="user_id">foobar</field>
        <field name="about">I am a somebody</field>
        <field name="music">pop, rock</field>
        <field name="movies">titanic</field>
        <field name="occupation">web developer</field>
    </doc>
</add>

quand j'ai essayé de poster en utilisant la même vieille commande :

java -jar post.jar mydoc.xml

Voici l'erreur que j'ai reçue :

SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file mydoc.xml
SimplePostTool: FATAL: Solr returned an error #400 ERROR: [doc=null] unknown field 'user_id'

J'ai également remarqué que si je redémarre le service Solr, il ne parvient pas à charger Solr Admin et affiche le message suivant :

HTTP ERROR 500

Problem accessing /solr/admin/. Reason:

    Severe errors in solr configuration.

Check your log files for more detailed information on what may be wrong.

If you want solr to continue after configuration errors, change: 

 <abortOnConfigurationError>false</abortOnConfigurationError>

in solr.xml

suivi d'un tas d'autres erreurs de type java...

Si je supprime mes propres champs personnalisés de schema.xml et que je redémarre Solr, le chargement de Solr Admin se fait sans problème.

Je suis donc perdu ici, comment puis-je ajouter mes propres champs personnalisés et être capable d'envoyer mes documents à Solr ?

2voto

Le problème, c'est que j'ai oublié de le mettre à jour :

<uniqueKey>id</uniqueKey>

pour être :

<uniqueKey>user_id</uniqueKey>

au bas du fichier schema.xml. Autre problème : lorsque j'effectue une recherche à l'aide de la fonction *:* dans l'administration de Solr, tout allait bien, mais lorsque j'ai effectué une recherche via une chaîne de caractères (mot-clé), un message d'erreur est apparu. undefined field text erreur. Pour résoudre ce problème, j'ai dû ajouter ce champ à l'un de mes champs :

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

Prograide.com

Prograide est une communauté de développeurs qui cherche à élargir la connaissance de la programmation au-delà de l'anglais.
Pour cela nous avons les plus grands doutes résolus en français et vous pouvez aussi poser vos propres questions ou résoudre celles des autres.

Powered by:

X