rpsoft 2000 logo, showing computers, music and a blackjack table'

Site Crawler Software

  • link checker

  • web crawler
    for webmasters

Windows:  2000, XP


 

Rpsoft Site-Crawler Links

Basic Features

Test Drive and Results

More Screens and Usage

Usage, Adjustment, and Troubleshooting


SITE CRAWLER

LOADING, USAGE, ADJUSTMENT, AND TROUBLESHOOTING

 

LOADING

There is a chance during load that the computer that is loading rpsoft 2000 site-crawler may need windows updates before the site-crawler load can be completed successfully.  First of all ensure that the computer that you are loading it into is either Windows 2000 or Windows XP Operating System.  If you have problems during the load - particularly stopping in the middle, and even more particularly at the loading of a system file marked "wininet.dll", then your computer may need updating prior to the load.  In this case, the most important update is the latest Microsoft Internet Explorer - which can be downloaded from the Net free from Microsoft.  The new Internet Explorer uses a later version of "wininit.dll" just as rpsoft 2000 site-crawler does.  Your Windows 2000 computer may not allow the wininit.dll from the rpsoft 2000 Microsoft loader to update your system files with the new wininit.dll unless you are also using the new Microsoft Internet Explorer.  Once you have updated your Windows Internet Explorer, the loading of rpsoft 2000 site-crawler should now work.

 

PREVENTION

Rpsoft 2000 site crawler is a “scanning” type bot.  Therefore it scans looking for key phrases.  It does not, however, interpret the code, so there is a chance that if a site is done in complex code for internal links, that rpsoft 2000 site crawler may not be able to follow it.  Links must be done using the reference “href=” or “src=” before the link.  Most webmasters, fortunately keep it simple. 

Of course, there is the thought that if a fellow webmaster (or webmistress) is using complex code for links that they are taking a chance on the site being “botable” by the search engines bots.  If the search engine bots cannot follow the links, then your link would not be seen anyway.  It is also unfortunately noticeable that some sites use very complex code for reciprocal links, but yet keep their favorite affiliate sponsors in simple html code as well as their own important links simple.  You can draw your own conclusion on that one.  We have.

  Suggestions:

  1. Partner with sites for links where this bot can in fact find their link.  While this is a simple software bot, it is impossible to know all of the strengths of the software bots of the search engines.  If this bot cannot find the link, perhaps neither will the search engines.
  2. If you have already linked to another and this bot cannot find the link, remember to check visually for the link before taking action and complaining to your fellow webmaster or webmistress.  There is a chance that the link could be there and link code complexity is preventing this bot from finding it.
  3. If checking a site visually for links, note how obvious or non-obvious it is to find the site link page.  Webmasters who hide their link page also may not make good partners - since human beings may also have great trouble in seeing your advertising.
  4. It is of course best to partner with sites where the link pages are in a upper directory of their site.  Sites that bury links multiple directories down are telling you that their link partners are not very important to them

TROUBLESHOOTING - POSSIBLE FIXABLE ISSUES

  1. Scan Speed - This site crawler will work faster of course on a high speed internet.  In either case, it should be faster per page than a browser since it does not interpret coding nor load pictures.  Still more speed in finding links can be had by going under "options" and modifying the program for "site crawl" and "search" options.  One can select web page phrases to give preference to, and what documents ( of a few) to scan or not scan, and how many pages each site to scan.  Note that in "options" under "search" that if you check the bottom item "eliminate duplicate html spaces before scanning" that you might be able to now scan for multiple words with less chance of error.  However, that option requires a third pass of each web page and will also slow down the search.  Check that option only if you really need it and only for the times it is needed.
     
  2. Link Page Blocked - If this bot cannot find a link, it might not be because of your link itself, it might be complex coding on the pages leading to the link page itself.  If this is the case, and you still wish to partner with this site for links, consider using the web page address itself where the link is and not the overall site address, such as: "http://www.thissite/links/software.htm".  In this case, the bot will scan this page first.  If the site owner does not move the link, then this reference should work - as long as the reciprocal link of course is done simply on the page itself.
     
  3. Too many pages - We have seen some sites with thousands of pages.  Even with a fast scan checker, checking the whole site can take a while.  You might be able to optimize the options more to get to the right page faster under "options" in the pull down menu - mostly in section "site crawl" of the options.  For example, if you are in the software business and sites like this tend to put your link on a web page marked "software" then software should be one of your priority words.  If that does not help since there still are too many pages to scan, consider that after finding the page to search that particular page first in the future - much as the example right above this in "2".
     
  4. Wrong Directory - The bot does not work well in a directory of a site rather than the main site anyway.  However, some sites may tell you that your link is in a certain directory of a site, and it might not be.  It might be in a directory higher.  In these cases, you might need to use the main site to scan rather than the directory that they told you.
     
  5. URL search issues - One of the most confusing search problems is if your url uses a space in it.  Html coding does not expect that and it is likely to put a “%20” where it sees the space within the html coding.  While this bot has been somewhat designed to handle at least some spaces in URLs, it is best to simply never use spaces within urls.
     

  6. Search Problems -Searching for single words is best (or a url) – and even then one must be careful.  Recall that the search is of the html and there are some differences in html coding than in the words that you visibly see on the web page.  Sometimes html composers will put in extra blanks in the html that do not show up on web pages.  Also some characters on a web page such as quotation marks, greater than, and less than symbols, and also extra blanks (more than one between words) are coded rather than the normal characters.  If you need to try and look for a phrase, go to “options” in the pull down menu and under “search” check the box for “eliminate duplicate html blanks”.  This will at least eliminate one of the problems, although it will slow down the search more since it adds an added page scan.  You should also ensure of course that the phrase you are searching for does not contain quotation marks, more than one space between words, greater than or less than symbols, or other coded items.
     
  7. Drop Down Boxes - A few sites, thankfully rare, may used drop down boxes for link page access.  The coding of this may be such that the coding is not active till user intervention, hence the bot will not get past this section to get to the link pages.  Note also that this technique is very poor for advertising your site also.  Since links are a form of advertising, hiding link page names in a drop down box will likely stop many potential customers from seeing your link - since an added action is required of them.  However, if your link is in fact on the site and you have found the page, and you still wish to link to this other site, then you could take the same action as in "2" above and copy and use the specific page that your link is found on
     
  8. Jams While Following Links – (a) Jams can occur with interruption in internet service or problems with it.  Please save long jobs at interim times to help avoid losing data (b) There may be a delay when clicking on site-crawler after it has been running alone for a long time - this delay may also be caused by waiting for an internet response (c) one site had asked for a consumer response on a web page before continuing.  Of course, a bot cannot do that.  We added the item “download” within “options” as a page to not load since that was the type of page that had caused that particular stoppage.
     
  9. Site Skip - If using multi-site searching, and you are at a site, and you believe it has already scanned the links pages and yet the site is still quite long, you can opt to click the button “skip this site” if in multi site mode.
     
  10. Re-Directs – if the link in fact does seem there visually but not by bot, look to see if the main url has changed. We find that some webmasters use re-directs from the site name they give you to another site. Now, all may still be well here if you are happy with the link exchange at the new url. If so, use that url to scan the link for and not the one that they gave you. Again, this bot is told to not leave the main url area it is given.
     
  11. “Cannot Load URL” – This indicator means that the site has a valid address but at least temporarily cannot be reached.  Best to try again later.  While some sites do go off of the air permanently, we have seen temporary problems in reaching even good sites.
     
  12. "cgi" files as well as "jpg", "exe" and other files - site-crawler will not load web pages that it believes are binary such as jpg files, exe files, gif files, xls files and the like.  It cannot scan binary files.  It also will not scan files or directories marked "cgi" since early testing showing many of those files as having binary content also.  If a link partner does store your link in a directory such as:  http://www.main.com/cgi-bin/links/yourbusiness.htm site-crawler will not find that link. What you can do is the same option as mentioned above - which is ask what page the link is on and enter that single page in your list to be scanned first.  If your link stays on the page, site-crawler will then find it.
     
  13. offsite storage - We find that perhaps 1% of the sites that we have worked with store their links off-site, in a site other than their own.  They sometimes store them at an automatic link site.  This could present several problems to their link partners.  The first is that bots (such as site-crawler) are programmed to not go off site, and therefore will miss the links if they are programmed to look for links at the main site.  The second is that since the links likely will be separated from the real site content, the search engines will not likely ever give the link page a good rating, and the link may in fact therefore may never help your own site standing.  If in spite of those reasons, you still wish to link to that site(s), then the way to do it is to search the offsite area instead of the main site itself.  For example, suppose you wish to link to http://www.thatsite.com and instead that particular site stores its links at http://www.paidlinks.com - likely in a site directory such as: http://www.paidlinks.com/thatsite/.  Then you might be able to use site-crawler to check http://www.paidlinks.com/thatsite/  for the reciprocal links.  We found that that worked in the 6 cases we had.
     
  14. abbreviated links -  Having trouble looking for a reciprocal link back to your site on another site while using a search word to search for your link such as http://www.yoursite.com/ or http://www.yoursite.com/index.htm ?  Note that others may have abbreviated the link back to your site simply as:  http://www.yoursite.com or even www.yoursite.com ?  Perhaps it is best to link for the simplest word that they could use, such as even "yoursite.com". 

TROUBLESHOOTING - NON FIXABLE ISSUES

  1. External Link Storage – Unless you use the new site address where the links are located, site-crawler will not find the links. Site-crawler is programmed to not go off site. See the discussion above in the last section, #13.
     
  2. Problems at link itself - If there is a coding problem at the link itself to you on a page such as a mouseover or drop-down box used for links, there might not be a work around - other than not linking to that web site as a partner.
     
  3. Site Link Search Engine Says Link is there - Look closely at the web address the search engine goes to.  It may be going off the main site to a paid link manager system on another site.  We have seen many cases of times when the links manager system claims that the link is on the target customer site, but it in fact is not.  Sometimes sites will not even store links on their site, but keep them at the link manager company.  Sometimes there are great intentions of doing the right thing and getting the link to the right site, but the link at the links manager site may never in fact download to their customer site - which is the site you are trying to link to.  Good luck in trying to fix this one.  We find that sites that use linking services are very un-responsive to emails.  That is because the site is paying for a service and doesn't want to be involved.  The linking service often accepts no complaints either.  So, there is often just no one to talk to, if these super automated systems fail.

SUMMARY of  SUGGESTED USAGE

We suggest insisting on keeping link coding simple – for your site and your possible link partners, both for the benefit of search engine bots to give you proper credit, and also of course for the speed and ease of this bot.  When checking for reciprocal links, we suggest using the main site url for small to medium size sites.  For huge sites – particularly those a few thousands pages or more – we do suggest beginning the search with the last known page of the link.  For checking large number of sites on multi-site operation, we suggest breaking it up to smaller amounts to ensure least loss of data.  Internet problems can cause jams in the program.  Please save the data and restart at times - if you have many sites to do.

If you follow these suggestions, as we have, we hope that you will find site crawler a great tool.  Thanks for your interest in our rpsoft 2000 products.

 

http://www.virtualsoftware.com/ProdPage.cfm?ProdID=2781

http://www.virtualsoftware.com/ProdPage.cfm?ProdID=2781

$ 29.95   Download It Now from The Virtual Software Store using Visa, Mastercard, AMEX, Discover, a USA-based checking account, prepaid InternetCash(tm) Cards or your Microsoft Passport wallet. Immediately download and install it on your computer. Offline payment options also available.

return

 
RPSOFT 2000 SITEMAP

RPSOFT 2000 PRODUCT

HOME PAGES

INFORMATION (click here for guide)

utility products

blackjack products

home page

ms office

music theory

blackjack

music chords

blackjack game

-

web sites

midi music

best bets

site crawler

database manager

support

digital photos

pool tips

ship sizes

email address bk.

file name changer

-

corel tips

consulting

audio noise

memory bank

metric conversion

eBooks

HDTV terms

MVP Baseball

madden game